Semantic Layer
Overview
The semantic layer is the core innovation of Cortex, providing a business-friendly abstraction over raw database complexity. It allows business users and developers to work with meaningful concepts like "monthly revenue" or "customer lifetime value" rather than writing complex SQL queries.
Core Concepts
Semantic Models
A semantic model is a JSON-based definition that describes how business concepts map to your underlying data:
{
"name": "sales_model",
"description": "Sales transactions and customer data",
"version": "1.0.0",
"tables": [
{
"name": "sales_transactions",
"primary_key": "transaction_id",
"columns": [
{"name": "transaction_id", "type": "uuid"},
{"name": "customer_id", "type": "uuid"},
{"name": "product_id", "type": "uuid"},
{"name": "sale_date", "type": "timestamp"},
{"name": "amount", "type": "decimal"},
{"name": "quantity", "type": "integer"}
]
},
{
"name": "customers",
"primary_key": "customer_id",
"columns": [
{"name": "customer_id", "type": "uuid"},
{"name": "name", "type": "string"},
{"name": "signup_date", "type": "date"}
]
}
],
"relationships": [
{
"name": "customer_sales",
"type": "one_to_many",
"from_table": "customers",
"to_table": "sales_transactions",
"from_column": "customer_id",
"to_column": "customer_id"
}
]
}
Metrics
Metrics are the primary business calculations that users want to perform. Each metric defines:
- Measures: What to calculate (sum, count, average, etc.)
- Dimensions: How to group the results
- Filters: What data to include/exclude
- Ordering: How to sort the results
- Formatting: How to display the results
{
"name": "monthly_revenue",
"description": "Total revenue aggregated by month",
"data_model": "sales_model",
"table_name": "sales_transactions",
"time_dimension": "sale_date",
"measures": [
{
"name": "revenue",
"type": "sum",
"query": "amount",
"description": "Total sales amount",
"formatting": [
{
"name": "currency_format",
"type": "format",
"mode": "post_query",
"format_string": "${:,.2f}"
}
]
},
{
"name": "transaction_count",
"type": "count",
"query": "*",
"description": "Number of transactions"
}
],
"dimensions": [
{
"name": "month",
"query": "DATE_TRUNC('month', sale_date)",
"description": "Month of the sale",
"formatting": [
{
"name": "date_format",
"type": "cast",
"mode": "in_query",
"target_type": "date"
}
]
},
{
"name": "customer_type",
"query": "customers.customer_type",
"description": "Type of customer",
"joins": [
{
"table": "customers",
"condition": "sales_transactions.customer_id = customers.customer_id"
}
]
}
],
"filters": [
{
"name": "date_range",
"type": "date_range",
"column": "sale_date",
"default": "last_30_days"
},
{
"name": "min_amount",
"type": "numeric",
"column": "amount",
"operator": ">=",
"default": 0
}
]
}
Measures
Measures define the quantitative calculations that can be performed on your data:
Aggregation Types
| Type | Description | Example |
|---|---|---|
sum | Sum of values | Total revenue, total quantity |
count | Count of rows | Number of orders, number of customers |
count_distinct | Count unique values | Number of unique customers |
avg | Average of values | Average order value |
min | Minimum value | Smallest order amount |
max | Maximum value | Largest order amount |
median | Median value | Middle value in sorted list |
Measure Definition
{
"name": "average_order_value",
"type": "avg",
"query": "amount",
"description": "Average value of customer orders",
"filters": [
{
"column": "amount",
"operator": ">",
"value": 0
}
],
"formatting": [
{
"name": "currency_format",
"type": "format",
"mode": "post_query",
"format_string": "${:,.2f}"
},
{
"name": "round_to_cent",
"type": "round",
"mode": "post_query",
"precision": 2
}
]
}
Dimensions
Dimensions define how data can be grouped, filtered, and organized:
Dimension Types
| Type | Description | Example |
|---|---|---|
categorical | Text or discrete values | Product category, customer segment |
temporal | Date/time values | Order date, signup date |
numeric | Numeric ranges | Age groups, price ranges |
geographic | Location data | Country, region, city |
Dimension Definition
{
"name": "customer_age_group",
"type": "categorical",
"query": "CASE WHEN age < 25 THEN 'Under 25' WHEN age < 35 THEN '25-34' WHEN age < 45 THEN '35-44' ELSE '45+' END",
"description": "Customer age categorized into groups",
"table_alias": "customers",
"formatting": [
{
"name": "title_case",
"type": "transform",
"mode": "post_query",
"function": "title"
}
],
"sort_order": "asc"
}
Filters
Filters control what data is included in the analysis:
Filter Types
| Type | Description | Operators |
|---|---|---|
string | Text matching | equals, contains, starts_with, ends_with |
numeric | Numeric comparison | =, !=, <, <=, >, >=, between |
date | Date comparison | before, after, between, last_n_days |
boolean | True/false values | is_true, is_false |
list | Multiple values | in, not_in |
Filter Definition
{
"name": "order_status_filter",
"type": "list",
"column": "status",
"description": "Filter by order status",
"default": ["completed", "shipped"],
"options": [
{"value": "pending", "label": "Pending"},
{"value": "completed", "label": "Completed"},
{"value": "shipped", "label": "Shipped"},
{"value": "cancelled", "label": "Cancelled"}
],
"multiple": true,
"required": false
}
Sorting
Cortex provides sophisticated result ordering through semantic order sequences that work seamlessly with measures, dimensions, and raw columns.
Order Sequence Types
| Type | Description | Use Case |
|---|---|---|
MEASURE | Order by measure name | Sort by calculated values like revenue, count |
DIMENSION | Order by dimension name | Sort by categorical values like product category |
COLUMN | Order by raw table column | Sort by specific database columns |
POSITION | Order by SELECT position | Sort by position in query results |
Order Sequence Definition
{
"name": "revenue_analysis",
"description": "Revenue analysis with custom sorting",
"table_name": "sales_transactions",
"ordered": true,
"order": [
{
"reference_type": "MEASURE",
"reference": "total_revenue",
"order_type": "DESC",
"nulls_position": "LAST"
},
{
"reference_type": "DIMENSION",
"reference": "month",
"order_type": "ASC",
"nulls_position": "FIRST"
}
],
"measures": [
{
"name": "total_revenue",
"type": "sum",
"query": "amount"
}
],
"dimensions": [
{
"name": "month",
"query": "DATE_TRUNC('month', sale_date)"
}
]
}
Default Ordering Rules
When no explicit ordering is specified, Cortex applies intelligent default sorting:
- First time dimension with granularity (ascending)
- First measure (descending)
- First dimension (ascending)
Order Types
| Type | Description | SQL Equivalent |
|---|---|---|
ASC | Ascending order | ASC |
DESC | Descending order | DESC |
Null Handling
| Position | Description | SQL Equivalent |
|---|---|---|
FIRST | Nulls first | NULLS FIRST |
LAST | Nulls last | NULLS LAST |
Output Formatting
Cortex supports advanced data formatting with two modes:
IN_QUERY Formatting
Applied during SQL generation, before query execution:
{
"name": "date_truncation",
"type": "function",
"mode": "in_query",
"function": "DATE_TRUNC",
"parameters": ["month", "sale_date"],
"target_type": "date"
}
POST_QUERY Formatting
Applied to query results after execution:
{
"name": "currency_formatting",
"type": "format",
"mode": "post_query",
"format_string": "${:,.2f}",
"locale": "en_US"
}
Built-in Formatters
| Formatter | Description | Example |
|---|---|---|
currency | Currency formatting | $1,234.56 |
percentage | Percentage display | 45.67% |
date | Date formatting | 2024-01-15 |
datetime | Date/time formatting | 2024-01-15 14:30:00 |
number | Number formatting | 1,234 |
round | Decimal rounding | 123.46 |
uppercase | Upper case text | SALES |
lowercase | Lower case text | sales |
title | Title case text | Sales |
Joins and Relationships
Define how tables relate to each other:
Join Types
| Type | Description | Use Case |
|---|---|---|
inner | Only matching rows | Required relationships |
left | All left table rows | Optional relationships |
right | All right table rows | Less common |
full | All rows from both tables | Complete data analysis |
Join Definition
{
"name": "customer_orders",
"type": "left",
"left_table": "customers",
"right_table": "orders",
"conditions": [
{
"left_column": "customers.customer_id",
"operator": "=",
"right_column": "orders.customer_id"
}
]
}
Parameters and Context
Dynamic Parameters
Parameters allow metrics to be customized at runtime:
{
"name": "revenue_by_date_range",
"parameters": [
{
"name": "start_date",
"type": "date",
"description": "Start date for analysis",
"default": "2024-01-01",
"required": true
},
{
"name": "end_date",
"type": "date",
"description": "End date for analysis",
"default": "2024-12-31",
"required": true
},
{
"name": "min_revenue",
"type": "numeric",
"description": "Minimum revenue threshold",
"default": 100.00,
"required": false
}
]
}
Consumer Context
Metrics can be personalized based on consumer properties:
{
"name": "my_team_performance",
"consumer_context": {
"team_filter": {
"column": "team_id",
"value_from_context": "consumer.team_id"
},
"region_filter": {
"column": "region",
"value_from_context": "consumer.region"
}
}
}
Metric Dependencies
Complex metrics can depend on other metrics:
{
"name": "revenue_growth_rate",
"description": "Month-over-month revenue growth",
"dependencies": [
{
"metric": "monthly_revenue",
"alias": "current_month"
},
{
"metric": "monthly_revenue",
"alias": "previous_month",
"parameters": {
"month_offset": -1
}
}
],
"calculation": "(current_month.revenue - previous_month.revenue) / previous_month.revenue * 100"
}
Validation and Testing
Schema Validation
All semantic models are validated against a comprehensive schema:
from cortex.core.semantics.schema import SemanticSchemaValidator
from cortex.core.semantics.metrics.metric import SemanticMetric
# Validate a metric
validator = SemanticSchemaValidator()
result = validator.validate_metric(metric_definition)
if not result.is_valid:
print("Validation errors:", result.errors)
Best Practices
1. Naming Conventions
- Use clear, business-friendly names
- Be consistent with terminology
- Avoid technical jargon when possible
2. Performance Optimization
- Use appropriate aggregation levels
- Implement efficient filters
- Consider caching for frequently used metrics
3. Documentation
- Always include descriptions
- Document parameter usage
4. Version Control
- Version your semantic models
- Track changes over time
5. Security
- Validate all inputs
- Use parameterized queries
- Implement proper access controls
Advanced Features
Custom SQL Functions
For complex calculations that can't be expressed with standard aggregations:
{
"name": "customer_lifetime_value",
"type": "custom_sql",
"query": "SUM(amount * customer_multiplier) OVER (PARTITION BY customer_id)",
"description": "Calculate customer lifetime value with custom logic"
}
Time-Based Calculations
Built-in support for time intelligence:
{
"name": "year_over_year_growth",
"type": "time_calculation",
"base_metric": "monthly_revenue",
"calculation": "percent_change",
"time_period": "year"
}
Conditional Logic
Apply different calculations based on conditions:
{
"name": "tiered_discount",
"type": "conditional",
"conditions": [
{
"if": "amount > 1000",
"then": "amount * 0.9"
},
{
"if": "amount > 500",
"then": "amount * 0.95"
},
{
"else": "amount"
}
]
}
Integration with AI Agents
The semantic layer is designed to work seamlessly with AI agents:
Natural Language Interface
- AI agents can interpret user questions and map them to semantic metrics
- Structured definitions enable reliable query generation
- Business context helps with disambiguation
Automated Discovery
- AI can analyze data sources and suggest relevant metrics
- Pattern recognition for similar calculations
- Automated documentation generation
Intelligent Optimization
- AI can optimize query performance based on usage patterns
- Suggest new metrics based on user behavior
- Identify redundant or unused metrics
The semantic layer serves as the foundation for AI-powered analytics, providing the structure and context needed for intelligent data interactions.