Navigation

Databases

Using CTEs (Common Table Expressions) for Complex Queries 2025

Master CTEs in SQL for complex queries. Learn recursive CTEs, performance tips, and real-world examples to simplify your database operations in 2025.

Table Of Contents

Introduction

Are you struggling with complex SQL queries that seem impossible to read and maintain? Do you find yourself writing nested subqueries that make your database operations look like a maze? Common Table Expressions (CTEs) are your solution to writing cleaner, more maintainable, and more powerful SQL queries.

CTEs provide a way to define temporary result sets that exist only for the duration of a single query. They act like temporary views that you can reference multiple times within your main query, making complex operations more readable and efficient. Whether you're dealing with hierarchical data, recursive operations, or simply want to break down complex logic into manageable chunks, CTEs are an essential tool in your SQL arsenal.

In this comprehensive guide, you'll learn everything about CTEs: from basic syntax to advanced recursive operations, performance considerations, and real-world use cases that will transform how you write SQL queries.

What Are Common Table Expressions (CTEs)?

Common Table Expressions are named temporary result sets that are defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or MERGE statement. Think of them as temporary tables or views that exist only for the duration of your query.

Basic CTE Syntax

The basic syntax for a CTE follows this pattern:

WITH cte_name (column1, column2, ...) AS (
    -- CTE query definition
    SELECT column1, column2, ...
    FROM table_name
    WHERE conditions
)
-- Main query that uses the CTE
SELECT *
FROM cte_name
WHERE additional_conditions;

Simple CTE Example

Let's start with a practical example using an e-commerce database:

WITH high_value_customers AS (
    SELECT 
        customer_id,
        customer_name,
        SUM(order_total) as total_spent
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    WHERE o.order_date >= '2024-01-01'
    GROUP BY customer_id, customer_name
    HAVING SUM(order_total) > 1000
)
SELECT 
    customer_name,
    total_spent,
    CASE 
        WHEN total_spent > 5000 THEN 'VIP'
        WHEN total_spent > 2500 THEN 'Premium'
        ELSE 'High Value'
    END as customer_tier
FROM high_value_customers
ORDER BY total_spent DESC;

This CTE makes the query much more readable by separating the logic for identifying high-value customers from the main query logic.

Types of CTEs

Non-Recursive CTEs

Non-recursive CTEs are the most common type and work like temporary views. They're perfect for:

  • Breaking down complex queries into manageable parts
  • Reusing subquery results multiple times
  • Improving query readability and maintenance
WITH monthly_sales AS (
    SELECT 
        EXTRACT(YEAR FROM order_date) as year,
        EXTRACT(MONTH FROM order_date) as month,
        SUM(order_total) as monthly_total,
        COUNT(*) as order_count
    FROM orders
    WHERE order_date >= '2024-01-01'
    GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
),
sales_with_growth AS (
    SELECT 
        year,
        month,
        monthly_total,
        order_count,
        LAG(monthly_total) OVER (ORDER BY year, month) as previous_month_total,
        monthly_total - LAG(monthly_total) OVER (ORDER BY year, month) as growth
    FROM monthly_sales
)
SELECT 
    year,
    month,
    monthly_total,
    order_count,
    COALESCE(growth, 0) as growth,
    CASE 
        WHEN previous_month_total IS NULL THEN 0
        ELSE ROUND((growth / previous_month_total) * 100, 2)
    END as growth_percentage
FROM sales_with_growth
ORDER BY year, month;

Recursive CTEs

Recursive CTEs are powerful tools for working with hierarchical or tree-structured data. They consist of two parts:

  1. Anchor member: The base case that starts the recursion
  2. Recursive member: The part that references the CTE itself
WITH RECURSIVE employee_hierarchy AS (
    -- Anchor member: Start with top-level managers
    SELECT 
        employee_id,
        name,
        manager_id,
        title,
        1 as level,
        CAST(name AS VARCHAR(1000)) as hierarchy_path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive member: Add subordinates
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        e.title,
        eh.level + 1,
        CONCAT(eh.hierarchy_path, ' -> ', e.name)
    FROM employees e
    JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT 
    employee_id,
    name,
    title,
    level,
    hierarchy_path,
    REPEAT('  ', level - 1) || name as indented_name
FROM employee_hierarchy
ORDER BY hierarchy_path;

Advanced CTE Techniques

Multiple CTEs in a Single Query

You can define multiple CTEs in a single query, which is excellent for complex data transformations:

WITH 
product_stats AS (
    SELECT 
        product_id,
        product_name,
        category_id,
        COUNT(*) as total_orders,
        SUM(quantity) as total_quantity_sold,
        AVG(unit_price) as avg_price
    FROM order_items oi
    JOIN products p ON oi.product_id = p.product_id
    GROUP BY product_id, product_name, category_id
),
category_performance AS (
    SELECT 
        c.category_id,
        c.category_name,
        SUM(ps.total_orders) as category_orders,
        SUM(ps.total_quantity_sold) as category_quantity,
        AVG(ps.avg_price) as category_avg_price
    FROM product_stats ps
    JOIN categories c ON ps.category_id = c.category_id
    GROUP BY c.category_id, c.category_name
),
top_categories AS (
    SELECT 
        category_id,
        category_name,
        category_orders,
        ROW_NUMBER() OVER (ORDER BY category_orders DESC) as rank
    FROM category_performance
)
SELECT 
    ps.product_name,
    tc.category_name,
    ps.total_orders,
    ps.total_quantity_sold,
    ROUND(ps.avg_price, 2) as avg_price,
    tc.rank as category_rank
FROM product_stats ps
JOIN top_categories tc ON ps.category_id = tc.category_id
WHERE tc.rank <= 5
ORDER BY tc.rank, ps.total_orders DESC;

CTEs with Window Functions

CTEs work excellently with window functions for complex analytical queries:

WITH customer_order_analysis AS (
    SELECT 
        c.customer_id,
        c.customer_name,
        o.order_id,
        o.order_date,
        o.order_total,
        ROW_NUMBER() OVER (PARTITION BY c.customer_id ORDER BY o.order_date) as order_sequence,
        SUM(o.order_total) OVER (PARTITION BY c.customer_id ORDER BY o.order_date 
                                ROWS UNBOUNDED PRECEDING) as cumulative_spent,
        LAG(o.order_date) OVER (PARTITION BY c.customer_id ORDER BY o.order_date) as previous_order_date,
        LEAD(o.order_date) OVER (PARTITION BY c.customer_id ORDER BY o.order_date) as next_order_date
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    WHERE o.order_date >= '2024-01-01'
)
SELECT 
    customer_name,
    order_sequence,
    order_date,
    order_total,
    cumulative_spent,
    CASE 
        WHEN previous_order_date IS NULL THEN 0
        ELSE DATE_PART('day', order_date - previous_order_date)
    END as days_since_last_order,
    CASE 
        WHEN next_order_date IS NULL THEN NULL
        ELSE DATE_PART('day', next_order_date - order_date)
    END as days_to_next_order
FROM customer_order_analysis
WHERE order_sequence <= 10  -- Focus on first 10 orders per customer
ORDER BY customer_name, order_sequence;

Real-World Use Cases

Financial Reporting and Analytics

CTEs are excellent for financial calculations and reporting:

WITH monthly_revenue AS (
    SELECT 
        DATE_TRUNC('month', order_date) as month,
        SUM(order_total) as revenue,
        COUNT(DISTINCT customer_id) as unique_customers,
        COUNT(*) as total_orders
    FROM orders
    WHERE order_date >= DATE_TRUNC('year', CURRENT_DATE)
    GROUP BY DATE_TRUNC('month', order_date)
),
revenue_with_metrics AS (
    SELECT 
        month,
        revenue,
        unique_customers,
        total_orders,
        LAG(revenue) OVER (ORDER BY month) as previous_month_revenue,
        revenue / unique_customers as avg_revenue_per_customer,
        revenue / total_orders as avg_order_value
    FROM monthly_revenue
),
final_report AS (
    SELECT 
        TO_CHAR(month, 'YYYY-MM') as month_year,
        ROUND(revenue, 2) as monthly_revenue,
        unique_customers,
        total_orders,
        ROUND(avg_revenue_per_customer, 2) as arpc,
        ROUND(avg_order_value, 2) as aov,
        CASE 
            WHEN previous_month_revenue IS NULL THEN 0
            ELSE ROUND(((revenue - previous_month_revenue) / previous_month_revenue) * 100, 2)
        END as month_over_month_growth
    FROM revenue_with_metrics
)
SELECT * FROM final_report
ORDER BY month_year;

Inventory Management

CTEs can help with complex inventory calculations:

WITH inventory_movements AS (
    SELECT 
        product_id,
        movement_date,
        movement_type,
        quantity,
        SUM(CASE WHEN movement_type = 'IN' THEN quantity ELSE -quantity END) 
            OVER (PARTITION BY product_id ORDER BY movement_date, movement_id 
                  ROWS UNBOUNDED PRECEDING) as running_balance
    FROM inventory_transactions
    WHERE movement_date >= CURRENT_DATE - INTERVAL '30 days'
),
current_stock AS (
    SELECT 
        product_id,
        running_balance as current_quantity
    FROM inventory_movements im1
    WHERE movement_date = (
        SELECT MAX(movement_date) 
        FROM inventory_movements im2 
        WHERE im2.product_id = im1.product_id
    )
),
sales_velocity AS (
    SELECT 
        product_id,
        SUM(quantity) as units_sold_30_days,
        SUM(quantity) / 30.0 as avg_daily_sales
    FROM order_items oi
    JOIN orders o ON oi.order_id = o.order_id
    WHERE o.order_date >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY product_id
)
SELECT 
    p.product_name,
    cs.current_quantity,
    COALESCE(sv.units_sold_30_days, 0) as units_sold_30_days,
    COALESCE(sv.avg_daily_sales, 0) as avg_daily_sales,
    CASE 
        WHEN sv.avg_daily_sales > 0 THEN 
            ROUND(cs.current_quantity / sv.avg_daily_sales, 1)
        ELSE NULL
    END as days_of_stock_remaining,
    CASE 
        WHEN cs.current_quantity <= 10 THEN 'CRITICAL'
        WHEN cs.current_quantity <= 50 THEN 'LOW'
        ELSE 'NORMAL'
    END as stock_status
FROM products p
LEFT JOIN current_stock cs ON p.product_id = cs.product_id
LEFT JOIN sales_velocity sv ON p.product_id = sv.product_id
ORDER BY days_of_stock_remaining ASC NULLS LAST;

Data Quality Analysis

Use CTEs for comprehensive data quality checks:

WITH data_quality_checks AS (
    SELECT 
        'customers' as table_name,
        COUNT(*) as total_records,
        COUNT(CASE WHEN email IS NULL OR email = '' THEN 1 END) as missing_email,
        COUNT(CASE WHEN phone IS NULL OR phone = '' THEN 1 END) as missing_phone,
        COUNT(CASE WHEN email NOT LIKE '%@%' THEN 1 END) as invalid_email,
        COUNT(DISTINCT customer_id) as unique_customers
    FROM customers
    
    UNION ALL
    
    SELECT 
        'orders' as table_name,
        COUNT(*) as total_records,
        COUNT(CASE WHEN customer_id IS NULL THEN 1 END) as missing_customer_id,
        COUNT(CASE WHEN order_total <= 0 THEN 1 END) as invalid_total,
        COUNT(CASE WHEN order_date > CURRENT_DATE THEN 1 END) as future_dates,
        COUNT(DISTINCT order_id) as unique_orders
    FROM orders
),
quality_percentages AS (
    SELECT 
        table_name,
        total_records,
        missing_email,
        missing_phone,
        invalid_email as issues_found,
        unique_customers as unique_records,
        ROUND((missing_email::FLOAT / total_records) * 100, 2) as missing_email_pct,
        ROUND((missing_phone::FLOAT / total_records) * 100, 2) as missing_phone_pct,
        ROUND((invalid_email::FLOAT / total_records) * 100, 2) as data_quality_issues_pct
    FROM data_quality_checks
)
SELECT 
    table_name,
    total_records,
    unique_records,
    issues_found,
    missing_email_pct,
    missing_phone_pct,
    data_quality_issues_pct,
    CASE 
        WHEN data_quality_issues_pct = 0 THEN 'EXCELLENT'
        WHEN data_quality_issues_pct <= 5 THEN 'GOOD'
        WHEN data_quality_issues_pct <= 15 THEN 'FAIR'
        ELSE 'POOR'
    END as data_quality_grade
FROM quality_percentages;

Performance Considerations

When CTEs Improve Performance

CTEs can significantly improve performance in several scenarios:

  1. Eliminating repeated subqueries: When you need the same result set multiple times
  2. Simplifying complex joins: Breaking down complex operations into steps
  3. Materialization optimization: Some databases materialize CTE results

When CTEs May Hurt Performance

Be aware of potential performance pitfalls:

  1. Over-materialization: Some databases may materialize CTEs unnecessarily
  2. Recursive CTEs without limits: Can cause infinite loops or excessive resource usage
  3. Complex CTEs in WHERE clauses: May prevent efficient index usage

Performance Optimization Tips

-- ✅ Good: Use appropriate LIMIT in recursive CTEs
WITH RECURSIVE category_hierarchy AS (
    SELECT category_id, parent_id, name, 1 as level
    FROM categories
    WHERE parent_id IS NULL
    
    UNION ALL
    
    SELECT c.category_id, c.parent_id, c.name, ch.level + 1
    FROM categories c
    JOIN category_hierarchy ch ON c.parent_id = ch.category_id
    WHERE ch.level < 10  -- Prevent infinite recursion
)
SELECT * FROM category_hierarchy;

-- ✅ Good: Index the columns used in CTE joins
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_order_items_product ON order_items(product_id);

-- ✅ Good: Use EXPLAIN to analyze CTE performance
EXPLAIN (ANALYZE, BUFFERS) 
WITH sales_summary AS (
    SELECT customer_id, SUM(order_total) as total_spent
    FROM orders
    WHERE order_date >= '2024-01-01'
    GROUP BY customer_id
)
SELECT c.customer_name, ss.total_spent
FROM customers c
JOIN sales_summary ss ON c.customer_id = ss.customer_id;

CTE Best Practices

Naming Conventions

Use descriptive names that clearly indicate the CTE's purpose:

-- ✅ Good naming
WITH 
high_value_customers_2024 AS (...),
monthly_sales_summary AS (...),
product_performance_metrics AS (...)

-- ❌ Poor naming
WITH 
cte1 AS (...),
temp_data AS (...),
x AS (...)

Code Organization

Structure your CTEs for maximum readability:

WITH 
-- Step 1: Get base customer data
customer_base AS (
    SELECT 
        customer_id,
        customer_name,
        registration_date,
        customer_tier
    FROM customers
    WHERE registration_date >= '2024-01-01'
),
-- Step 2: Calculate order metrics
order_metrics AS (
    SELECT 
        customer_id,
        COUNT(*) as total_orders,
        SUM(order_total) as total_spent,
        AVG(order_total) as avg_order_value,
        MAX(order_date) as last_order_date
    FROM orders
    WHERE order_date >= '2024-01-01'
    GROUP BY customer_id
),
-- Step 3: Combine and analyze
customer_analysis AS (
    SELECT 
        cb.customer_name,
        cb.customer_tier,
        cb.registration_date,
        COALESCE(om.total_orders, 0) as total_orders,
        COALESCE(om.total_spent, 0) as total_spent,
        COALESCE(om.avg_order_value, 0) as avg_order_value,
        om.last_order_date,
        CASE 
            WHEN om.last_order_date IS NULL THEN 'Never Ordered'
            WHEN om.last_order_date < CURRENT_DATE - INTERVAL '90 days' THEN 'Inactive'
            WHEN om.last_order_date < CURRENT_DATE - INTERVAL '30 days' THEN 'At Risk'
            ELSE 'Active'
        END as customer_status
    FROM customer_base cb
    LEFT JOIN order_metrics om ON cb.customer_id = om.customer_id
)
-- Final query
SELECT 
    customer_status,
    COUNT(*) as customer_count,
    ROUND(AVG(total_spent), 2) as avg_total_spent,
    ROUND(AVG(total_orders), 2) as avg_orders_per_customer
FROM customer_analysis
GROUP BY customer_status
ORDER BY 
    CASE customer_status
        WHEN 'Active' THEN 1
        WHEN 'At Risk' THEN 2
        WHEN 'Inactive' THEN 3
        WHEN 'Never Ordered' THEN 4
    END;

Error Handling and Validation

Include validation logic in your CTEs:

WITH RECURSIVE organizational_chart AS (
    -- Validate that we have valid starting points
    SELECT 
        employee_id,
        name,
        manager_id,
        department,
        1 as level,
        ARRAY[employee_id] as path,
        CAST(name AS VARCHAR(1000)) as hierarchy_path
    FROM employees
    WHERE manager_id IS NULL
    AND employee_id IS NOT NULL  -- Validation
    
    UNION ALL
    
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        e.department,
        oc.level + 1,
        oc.path || e.employee_id,
        CONCAT(oc.hierarchy_path, ' -> ', e.name)
    FROM employees e
    JOIN organizational_chart oc ON e.manager_id = oc.employee_id
    WHERE 
        oc.level < 20  -- Prevent infinite recursion
        AND NOT (e.employee_id = ANY(oc.path))  -- Prevent circular references
        AND e.employee_id IS NOT NULL  -- Validation
)
SELECT 
    level,
    name,
    department,
    hierarchy_path,
    CASE 
        WHEN level > 10 THEN 'Potential Data Issue: Deep Hierarchy'
        ELSE 'Normal'
    END as data_quality_flag
FROM organizational_chart
ORDER BY hierarchy_path;

Common Mistakes and How to Avoid Them

Mistake 1: Infinite Recursion

-- ❌ Dangerous: No recursion limit
WITH RECURSIVE dangerous_cte AS (
    SELECT id, parent_id, 1 as level FROM table_name WHERE parent_id IS NULL
    UNION ALL
    SELECT t.id, t.parent_id, dc.level + 1
    FROM table_name t
    JOIN dangerous_cte dc ON t.parent_id = dc.id
)

-- ✅ Safe: Include recursion limit and cycle detection
WITH RECURSIVE safe_cte AS (
    SELECT id, parent_id, 1 as level, ARRAY[id] as path
    FROM table_name WHERE parent_id IS NULL
    UNION ALL
    SELECT t.id, t.parent_id, sc.level + 1, sc.path || t.id
    FROM table_name t
    JOIN safe_cte sc ON t.parent_id = sc.id
    WHERE sc.level < 100 AND NOT (t.id = ANY(sc.path))
)

Mistake 2: Unnecessary Complexity

-- ❌ Overly complex CTE
WITH complex_unnecessary AS (
    SELECT 
        customer_id,
        (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) as order_count
    FROM customers c
)

-- ✅ Simpler and more efficient
WITH customer_orders AS (
    SELECT 
        customer_id,
        COUNT(*) as order_count
    FROM orders
    GROUP BY customer_id
)

Mistake 3: Poor Performance Due to Lack of Indexing

-- ✅ Ensure proper indexes exist before using CTEs
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date DESC);
CREATE INDEX idx_customers_registration ON customers(registration_date);

WITH recent_customers AS (
    SELECT customer_id, customer_name
    FROM customers
    WHERE registration_date >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT 
    rc.customer_name,
    COUNT(o.order_id) as recent_orders
FROM recent_customers rc
LEFT JOIN orders o ON rc.customer_id = o.customer_id
    AND o.order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY rc.customer_id, rc.customer_name;

Database-Specific CTE Features

PostgreSQL CTEs

PostgreSQL offers advanced CTE features:

-- Writable CTEs (UPDATE/INSERT/DELETE)
WITH updated_prices AS (
    UPDATE products 
    SET price = price * 1.1 
    WHERE category_id = 1
    RETURNING product_id, product_name, price
)
SELECT 
    product_name,
    price as new_price,
    price / 1.1 as old_price
FROM updated_prices;

-- MATERIALIZED hint for optimization
WITH MATERIALIZED expensive_calculation AS (
    SELECT 
        product_id,
        complex_calculation(price, cost, tax_rate) as margin
    FROM products
    WHERE active = true
)
SELECT * FROM expensive_calculation WHERE margin > 0.2;

SQL Server CTEs

SQL Server specific features:

-- Using OPTION clause for query hints
WITH sales_cte AS (
    SELECT customer_id, SUM(amount) as total_sales
    FROM orders
    GROUP BY customer_id
)
SELECT * FROM sales_cte
OPTION (RECOMPILE);

-- Using CTEs with MERGE statements
WITH source_data AS (
    SELECT product_id, new_price
    FROM product_updates
)
MERGE products AS target
USING source_data AS source ON target.product_id = source.product_id
WHEN MATCHED THEN UPDATE SET price = source.new_price;

Frequently Asked Questions

What's the difference between CTEs and subqueries?

CTEs offer better readability and can be referenced multiple times within the same query, while subqueries are inline and can only be used once. CTEs are particularly useful for complex queries where you need to break down logic into manageable steps.

Can I use CTEs with INSERT, UPDATE, and DELETE statements?

Yes, CTEs can be used with INSERT, UPDATE, and DELETE statements. In PostgreSQL, you can even create "writable CTEs" that perform modifications and return results that can be used in the main query.

Are CTEs faster than temporary tables?

It depends on your specific use case and database system. CTEs are generally faster for simple operations since they don't require disk I/O, but temporary tables might be better for complex operations involving large datasets that benefit from indexing.

How do I prevent infinite recursion in recursive CTEs?

Always include a termination condition in your recursive CTE, such as a maximum level counter or cycle detection using arrays to track visited nodes. Set reasonable limits (like WHERE level < 100) to prevent runaway recursion.

Can I nest CTEs within other CTEs?

No, you cannot directly nest CTE definitions within other CTEs. However, you can define multiple CTEs in sequence, where later CTEs reference earlier ones, achieving similar functionality with better readability.

What's the maximum number of CTEs I can use in a single query?

While there's no strict limit in most databases, practical considerations like query complexity and maintainability suggest keeping the number reasonable (typically under 10 CTEs per query). If you need more, consider breaking the query into smaller parts or using temporary tables.

Conclusion

Common Table Expressions are a powerful feature that can dramatically improve the readability, maintainability, and sometimes performance of your SQL queries. By mastering both non-recursive and recursive CTEs, you'll be able to tackle complex data problems with elegant solutions.

Key takeaways from this guide:

  • CTEs simplify complex queries by breaking them into logical, manageable steps
  • Recursive CTEs excel at hierarchical data processing and tree traversal operations
  • Performance considerations matter - always include proper termination conditions and consider indexing
  • Good naming and organization make your CTEs self-documenting and easier to maintain
  • Real-world applications span from financial reporting to inventory management and data quality analysis

Start incorporating CTEs into your SQL toolkit today. Begin with simple non-recursive CTEs to replace complex subqueries, then gradually explore recursive CTEs for hierarchical data challenges. Remember to always test performance and include proper safeguards, especially with recursive operations.

Ready to level up your SQL skills? Try implementing some of the examples from this guide in your own database projects, and share your experiences in the comments below. What complex query challenges are you facing that CTEs might solve?

Share this article

Add Comment

No comments yet. Be the first to comment!

More from Databases