Aggregations
Overview
Aggregations provide powerful analytical capabilities for complex data analysis. They allow you to perform statistical calculations, window functions, and custom expressions that aren't possible with basic measures.
Diagram showing different types of aggregations and their use cases
Use Cases
Aggregations are used for:
- Statistical Analysis: Percentiles, standard deviation, variance
- Window Functions: Running totals, rankings, moving averages
- Custom Calculations: Complex business logic and formulas
- Time Series Analysis: Period-over-period comparisons, growth rates
- Performance Metrics: Response time percentiles, error rates
Syntax
Type: SemanticAggregation
List of aggregations to be applied to the data. Defines complex aggregations beyond basic measures.
{
"aggregations": [
{
"name": "revenue_percentile",
"description": "95th percentile of revenue",
"type": "percentile",
"source_columns": ["revenue"],
"target_column": "revenue_p95",
"percentile_value": 0.95,
"where_condition": "status = 'completed'",
"window": {
"partition_by": ["user_segment"],
"order_by": ["created_at"],
"frame_start": "UNBOUNDED PRECEDING",
"frame_end": "CURRENT ROW"
}
},
{
"name": "custom_metric",
"description": "Custom calculation",
"type": "custom",
"source_columns": ["revenue", "cost"],
"target_column": "profit_margin",
"custom_expression": "(revenue - cost) / revenue * 100"
}
]
}
Parameters
name
Type: str
Required: Yes
Unique identifier name for this aggregation. Used to reference the aggregation in queries and results.
{
"name": "revenue_percentile"
}
description
Type: Optional[str]
Default: None
Required: No
Human-readable explanation of this aggregation. Helps developers understand the calculation logic.
{
"description": "95th percentile of revenue by user segment"
}
type
Type: AggregationType
Required: Yes
Type of aggregation to perform. Determines the calculation method.
Available Values:
- Basic:
count,sum,avg,min,max - Statistical:
stddev,variance,percentile,median - Window:
row_number,rank,dense_rank,lag,lead - Custom:
custom
{
"type": "percentile"
}
source_columns
Type: List[str]
Required: Yes
Columns to aggregate. Specifies which data columns will be used in the calculation.
{
"source_columns": ["revenue"]
}
target_column
Type: str
Required: Yes
Result column name for the aggregation. The name of the output column containing the result.
{
"target_column": "revenue_p95"
}
custom_expression
Type: Optional[str]
Default: None
Required: No
SQL expression for custom aggregations. Allows you to define complex calculations using SQL.
{
"custom_expression": "(revenue - cost) / revenue * 100"
}
window
Type: Optional[AggregationWindow]
Default: None
Required: No
Window function parameters for aggregations. Defines the window for window functions.
{
"window": {
"partition_by": ["user_segment"],
"order_by": ["created_at"],
"frame_start": "UNBOUNDED PRECEDING",
"frame_end": "CURRENT ROW"
}
}
percentile_value
Type: Optional[float]
Default: None
Required: No
Float value for percentile aggregations (0.0-1.0). Specifies which percentile to calculate.
{
"percentile_value": 0.95
}
where_condition
Type: Optional[str]
Default: None
Required: No
SQL condition for conditional aggregations. Filters data before applying the aggregation.
{
"where_condition": "status = 'completed'"
}
group_by_columns
Type: Optional[List[str]]
Default: None
Required: No
Columns for grouped aggregations. Specifies how to group the data for the aggregation.
{
"group_by_columns": ["user_segment", "month"]
}
Window Parameters
partition_by
Type: Optional[List[str]]
Default: None
Required: No
Columns for partitioning the window. Divides the result set into partitions.
{
"partition_by": ["user_segment"]
}
order_by
Type: Optional[List[str]]
Default: None
Required: No
Columns for ordering within the window. Determines the order of rows in each partition.
{
"order_by": ["created_at"]
}
frame_start
Type: Optional[str]
Default: None
Required: No
Window frame start specification. Defines the start of the window frame.
{
"frame_start": "UNBOUNDED PRECEDING"
}
frame_end
Type: Optional[str]
Default: None
Required: No
Window frame end specification. Defines the end of the window frame.
{
"frame_end": "CURRENT ROW"
}
Patterns
Statistical Analysis
{
"aggregations": [
{
"name": "revenue_p95",
"type": "percentile",
"source_columns": ["revenue"],
"target_column": "revenue_p95",
"percentile_value": 0.95
},
{
"name": "revenue_stddev",
"type": "stddev",
"source_columns": ["revenue"],
"target_column": "revenue_stddev"
},
{
"name": "revenue_median",
"type": "median",
"source_columns": ["revenue"],
"target_column": "revenue_median"
}
]
}
Window Functions
{
"aggregations": [
{
"name": "running_total",
"type": "sum",
"source_columns": ["revenue"],
"target_column": "running_total",
"window": {
"partition_by": ["user_id"],
"order_by": ["created_at"],
"frame_start": "UNBOUNDED PRECEDING",
"frame_end": "CURRENT ROW"
}
},
{
"name": "revenue_rank",
"type": "rank",
"source_columns": ["revenue"],
"target_column": "revenue_rank",
"window": {
"partition_by": ["user_segment"],
"order_by": ["revenue DESC"]
}
}
]
}
Custom Business Logic
{
"aggregations": [
{
"name": "profit_margin",
"type": "custom",
"source_columns": ["revenue", "cost"],
"target_column": "profit_margin",
"custom_expression": "CASE WHEN revenue > 0 THEN (revenue - cost) / revenue * 100 ELSE 0 END"
},
{
"name": "growth_rate",
"type": "custom",
"source_columns": ["current_revenue", "previous_revenue"],
"target_column": "growth_rate",
"custom_expression": "CASE WHEN previous_revenue > 0 THEN (current_revenue - previous_revenue) / previous_revenue * 100 ELSE NULL END"
}
]
}
Time Series Analysis
{
"aggregations": [
{
"name": "monthly_growth",
"type": "custom",
"source_columns": ["revenue"],
"target_column": "monthly_growth",
"custom_expression": "LAG(revenue) OVER (PARTITION BY user_segment ORDER BY month)",
"window": {
"partition_by": ["user_segment"],
"order_by": ["month"]
}
},
{
"name": "moving_average",
"type": "avg",
"source_columns": ["revenue"],
"target_column": "moving_average_3m",
"window": {
"partition_by": ["user_segment"],
"order_by": ["month"],
"frame_start": "2 PRECEDING",
"frame_end": "CURRENT ROW"
}
}
]
}
Advanced Window Functions
Ranking Functions
{
"aggregations": [
{
"name": "revenue_rank",
"type": "rank",
"source_columns": ["revenue"],
"target_column": "revenue_rank",
"window": {
"partition_by": ["user_segment"],
"order_by": ["revenue DESC"]
}
},
{
"name": "revenue_dense_rank",
"type": "dense_rank",
"source_columns": ["revenue"],
"target_column": "revenue_dense_rank",
"window": {
"partition_by": ["user_segment"],
"order_by": ["revenue DESC"]
}
}
]
}
Lead/Lag Functions
{
"aggregations": [
{
"name": "previous_revenue",
"type": "lag",
"source_columns": ["revenue"],
"target_column": "previous_revenue",
"window": {
"partition_by": ["user_segment"],
"order_by": ["month"]
}
},
{
"name": "next_revenue",
"type": "lead",
"source_columns": ["revenue"],
"target_column": "next_revenue",
"window": {
"partition_by": ["user_segment"],
"order_by": ["month"]
}
}
]
}
Performance
Indexing for Window Functions
{
"window": {
"partition_by": ["user_segment"],
"order_by": ["created_at"]
}
}
# Ensure indexes on (user_segment, created_at)
Memory Usage
- Window functions can be memory-intensive
- Consider partitioning strategies for large datasets
- Use appropriate frame specifications
Best Practices
- Choose Appropriate Types: Select the right aggregation type for your use case
- Optimize Window Functions: Use efficient partitioning and ordering
- Handle Null Values: Consider how null values affect your calculations
- Test Performance: Monitor query performance with complex aggregations
- Use Descriptive Names: Choose clear names for your aggregations
Video Tutorial
Video: Advanced aggregation techniques in the Cortex dashboard
Aggregation Builder Interface
Screenshot of the visual aggregation builder in the Cortex dashboard
Related Topics
- Measures Configuration - Basic quantitative metrics
- Dimensions Configuration - Writing custom expressions
- Measures Configuration - Complete metric setup guide