Semantics

Aggregations

Define complex calculations beyond basic measures, enabling advanced analytics like percentiles, window functions, and custom calculations.

Overview

Aggregations provide powerful analytical capabilities for complex data analysis. They allow you to perform statistical calculations, window functions, and custom expressions that aren't possible with basic measures.

Diagram showing different types of aggregations and their use cases

Use Cases

Aggregations are used for:

  • Statistical Analysis: Percentiles, standard deviation, variance
  • Window Functions: Running totals, rankings, moving averages
  • Custom Calculations: Complex business logic and formulas
  • Time Series Analysis: Period-over-period comparisons, growth rates
  • Performance Metrics: Response time percentiles, error rates

Syntax

Type: SemanticAggregation

List of aggregations to be applied to the data. Defines complex aggregations beyond basic measures.

{
    "aggregations": [
        {
            "name": "revenue_percentile",
            "description": "95th percentile of revenue",
            "type": "percentile",
            "source_columns": ["revenue"],
            "target_column": "revenue_p95",
            "percentile_value": 0.95,
            "where_condition": "status = 'completed'",
            "window": {
                "partition_by": ["user_segment"],
                "order_by": ["created_at"],
                "frame_start": "UNBOUNDED PRECEDING",
                "frame_end": "CURRENT ROW"
            }
        },
        {
            "name": "custom_metric",
            "description": "Custom calculation",
            "type": "custom",
            "source_columns": ["revenue", "cost"],
            "target_column": "profit_margin",
            "custom_expression": "(revenue - cost) / revenue * 100"
        }
    ]
}

Parameters

name

Type: str
Required: Yes

Unique identifier name for this aggregation. Used to reference the aggregation in queries and results.

{
    "name": "revenue_percentile"
}

description

Type: Optional[str]
Default: None
Required: No

Human-readable explanation of this aggregation. Helps developers understand the calculation logic.

{
    "description": "95th percentile of revenue by user segment"
}

type

Type: AggregationType
Required: Yes

Type of aggregation to perform. Determines the calculation method.

Available Values:

  • Basic: count, sum, avg, min, max
  • Statistical: stddev, variance, percentile, median
  • Window: row_number, rank, dense_rank, lag, lead
  • Custom: custom
{
    "type": "percentile"
}

source_columns

Type: List[str]
Required: Yes

Columns to aggregate. Specifies which data columns will be used in the calculation.

{
    "source_columns": ["revenue"]
}

target_column

Type: str
Required: Yes

Result column name for the aggregation. The name of the output column containing the result.

{
    "target_column": "revenue_p95"
}

custom_expression

Type: Optional[str]
Default: None
Required: No

SQL expression for custom aggregations. Allows you to define complex calculations using SQL.

{
    "custom_expression": "(revenue - cost) / revenue * 100"
}

window

Type: Optional[AggregationWindow]
Default: None
Required: No

Window function parameters for aggregations. Defines the window for window functions.

{
    "window": {
        "partition_by": ["user_segment"],
        "order_by": ["created_at"],
        "frame_start": "UNBOUNDED PRECEDING",
        "frame_end": "CURRENT ROW"
    }
}

percentile_value

Type: Optional[float]
Default: None
Required: No

Float value for percentile aggregations (0.0-1.0). Specifies which percentile to calculate.

{
    "percentile_value": 0.95
}

where_condition

Type: Optional[str]
Default: None
Required: No

SQL condition for conditional aggregations. Filters data before applying the aggregation.

{
    "where_condition": "status = 'completed'"
}

group_by_columns

Type: Optional[List[str]]
Default: None
Required: No

Columns for grouped aggregations. Specifies how to group the data for the aggregation.

{
    "group_by_columns": ["user_segment", "month"]
}

Window Parameters

partition_by

Type: Optional[List[str]]
Default: None
Required: No

Columns for partitioning the window. Divides the result set into partitions.

{
    "partition_by": ["user_segment"]
}

order_by

Type: Optional[List[str]]
Default: None
Required: No

Columns for ordering within the window. Determines the order of rows in each partition.

{
    "order_by": ["created_at"]
}

frame_start

Type: Optional[str]
Default: None
Required: No

Window frame start specification. Defines the start of the window frame.

{
    "frame_start": "UNBOUNDED PRECEDING"
}

frame_end

Type: Optional[str]
Default: None
Required: No

Window frame end specification. Defines the end of the window frame.

{
    "frame_end": "CURRENT ROW"
}

Patterns

Statistical Analysis

{
    "aggregations": [
        {
            "name": "revenue_p95",
            "type": "percentile",
            "source_columns": ["revenue"],
            "target_column": "revenue_p95",
            "percentile_value": 0.95
        },
        {
            "name": "revenue_stddev",
            "type": "stddev",
            "source_columns": ["revenue"],
            "target_column": "revenue_stddev"
        },
        {
            "name": "revenue_median",
            "type": "median",
            "source_columns": ["revenue"],
            "target_column": "revenue_median"
        }
    ]
}

Window Functions

{
    "aggregations": [
        {
            "name": "running_total",
            "type": "sum",
            "source_columns": ["revenue"],
            "target_column": "running_total",
            "window": {
                "partition_by": ["user_id"],
                "order_by": ["created_at"],
                "frame_start": "UNBOUNDED PRECEDING",
                "frame_end": "CURRENT ROW"
            }
        },
        {
            "name": "revenue_rank",
            "type": "rank",
            "source_columns": ["revenue"],
            "target_column": "revenue_rank",
            "window": {
                "partition_by": ["user_segment"],
                "order_by": ["revenue DESC"]
            }
        }
    ]
}

Custom Business Logic

{
    "aggregations": [
        {
            "name": "profit_margin",
            "type": "custom",
            "source_columns": ["revenue", "cost"],
            "target_column": "profit_margin",
            "custom_expression": "CASE WHEN revenue > 0 THEN (revenue - cost) / revenue * 100 ELSE 0 END"
        },
        {
            "name": "growth_rate",
            "type": "custom",
            "source_columns": ["current_revenue", "previous_revenue"],
            "target_column": "growth_rate",
            "custom_expression": "CASE WHEN previous_revenue > 0 THEN (current_revenue - previous_revenue) / previous_revenue * 100 ELSE NULL END"
        }
    ]
}

Time Series Analysis

{
    "aggregations": [
        {
            "name": "monthly_growth",
            "type": "custom",
            "source_columns": ["revenue"],
            "target_column": "monthly_growth",
            "custom_expression": "LAG(revenue) OVER (PARTITION BY user_segment ORDER BY month)",
            "window": {
                "partition_by": ["user_segment"],
                "order_by": ["month"]
            }
        },
        {
            "name": "moving_average",
            "type": "avg",
            "source_columns": ["revenue"],
            "target_column": "moving_average_3m",
            "window": {
                "partition_by": ["user_segment"],
                "order_by": ["month"],
                "frame_start": "2 PRECEDING",
                "frame_end": "CURRENT ROW"
            }
        }
    ]
}

Advanced Window Functions

Ranking Functions

{
    "aggregations": [
        {
            "name": "revenue_rank",
            "type": "rank",
            "source_columns": ["revenue"],
            "target_column": "revenue_rank",
            "window": {
                "partition_by": ["user_segment"],
                "order_by": ["revenue DESC"]
            }
        },
        {
            "name": "revenue_dense_rank",
            "type": "dense_rank",
            "source_columns": ["revenue"],
            "target_column": "revenue_dense_rank",
            "window": {
                "partition_by": ["user_segment"],
                "order_by": ["revenue DESC"]
            }
        }
    ]
}

Lead/Lag Functions

{
    "aggregations": [
        {
            "name": "previous_revenue",
            "type": "lag",
            "source_columns": ["revenue"],
            "target_column": "previous_revenue",
            "window": {
                "partition_by": ["user_segment"],
                "order_by": ["month"]
            }
        },
        {
            "name": "next_revenue",
            "type": "lead",
            "source_columns": ["revenue"],
            "target_column": "next_revenue",
            "window": {
                "partition_by": ["user_segment"],
                "order_by": ["month"]
            }
        }
    ]
}

Performance

Indexing for Window Functions

{
    "window": {
        "partition_by": ["user_segment"],
        "order_by": ["created_at"]
    }
}
# Ensure indexes on (user_segment, created_at)

Memory Usage

  • Window functions can be memory-intensive
  • Consider partitioning strategies for large datasets
  • Use appropriate frame specifications

Best Practices

  1. Choose Appropriate Types: Select the right aggregation type for your use case
  2. Optimize Window Functions: Use efficient partitioning and ordering
  3. Handle Null Values: Consider how null values affect your calculations
  4. Test Performance: Monitor query performance with complex aggregations
  5. Use Descriptive Names: Choose clear names for your aggregations

Video Tutorial

Video: Advanced aggregation techniques in the Cortex dashboard

Aggregation Builder Interface

Screenshot of the visual aggregation builder in the Cortex dashboard