Architecture

Semantic Layer

Understanding the semantic layer that provides business-friendly data abstractions.

Overview

The semantic layer is the core innovation of Cortex, providing a business-friendly abstraction over raw database complexity. It allows business users and developers to work with meaningful concepts like "monthly revenue" or "customer lifetime value" rather than writing complex SQL queries.

Core Concepts

Semantic Models

A semantic model is a JSON-based definition that describes how business concepts map to your underlying data:

{
  "name": "sales_model",
  "description": "Sales transactions and customer data",
  "version": "1.0.0",
  "tables": [
    {
      "name": "sales_transactions",
      "primary_key": "transaction_id",
      "columns": [
        {"name": "transaction_id", "type": "uuid"},
        {"name": "customer_id", "type": "uuid"},
        {"name": "product_id", "type": "uuid"},
        {"name": "sale_date", "type": "timestamp"},
        {"name": "amount", "type": "decimal"},
        {"name": "quantity", "type": "integer"}
      ]
    },
    {
      "name": "customers",
      "primary_key": "customer_id",
      "columns": [
        {"name": "customer_id", "type": "uuid"},
        {"name": "name", "type": "string"},
        {"name": "signup_date", "type": "date"}
      ]
    }
  ],
  "relationships": [
    {
      "name": "customer_sales",
      "type": "one_to_many",
      "from_table": "customers",
      "to_table": "sales_transactions",
      "from_column": "customer_id",
      "to_column": "customer_id"
    }
  ]
}

Metrics

Metrics are the primary business calculations that users want to perform. Each metric defines:

Measures: What to calculate (sum, count, average, etc.)
Dimensions: How to group the results
Filters: What data to include/exclude
Ordering: How to sort the results
Formatting: How to display the results

{
  "name": "monthly_revenue",
  "description": "Total revenue aggregated by month",
  "data_model": "sales_model",
  "table_name": "sales_transactions",
  "time_dimension": "sale_date",
  "measures": [
    {
      "name": "revenue",
      "type": "sum",
      "query": "amount",
      "description": "Total sales amount",
      "formatting": [
        {
          "name": "currency_format",
          "type": "format",
          "mode": "post_query",
          "format_string": "${:,.2f}"
        }
      ]
    },
    {
      "name": "transaction_count",
      "type": "count",
      "query": "*",
      "description": "Number of transactions"
    }
  ],
  "dimensions": [
    {
      "name": "month",
      "query": "DATE_TRUNC('month', sale_date)",
      "description": "Month of the sale",
      "formatting": [
        {
          "name": "date_format",
          "type": "cast",
          "mode": "in_query",
          "target_type": "date"
        }
      ]
    },
    {
      "name": "customer_type",
      "query": "customers.customer_type",
      "description": "Type of customer",
      "joins": [
        {
          "table": "customers",
          "condition": "sales_transactions.customer_id = customers.customer_id"
        }
      ]
    }
  ],
  "filters": [
    {
      "name": "date_range",
      "type": "date_range",
      "column": "sale_date",
      "default": "last_30_days"
    },
    {
      "name": "min_amount",
      "type": "numeric",
      "column": "amount",
      "operator": ">=",
      "default": 0
    }
  ]
}

Measures

Measures define the quantitative calculations that can be performed on your data:

Aggregation Types

Type	Description	Example
`sum`	Sum of values	Total revenue, total quantity
`count`	Count of rows	Number of orders, number of customers
`count_distinct`	Count unique values	Number of unique customers
`avg`	Average of values	Average order value
`min`	Minimum value	Smallest order amount
`max`	Maximum value	Largest order amount
`median`	Median value	Middle value in sorted list

Measure Definition

Complete Measure Example

{
  "name": "average_order_value",
  "type": "avg",
  "query": "amount",
  "description": "Average value of customer orders",
  "filters": [
    {
      "column": "amount",
      "operator": ">",
      "value": 0
    }
  ],
  "formatting": [
    {
      "name": "currency_format",
      "type": "format",
      "mode": "post_query",
      "format_string": "${:,.2f}"
    },
    {
      "name": "round_to_cent",
      "type": "round",
      "mode": "post_query",
      "precision": 2
    }
  ]
}

Dimensions

Dimensions define how data can be grouped, filtered, and organized:

Dimension Types

Type	Description	Example
`categorical`	Text or discrete values	Product category, customer segment
`temporal`	Date/time values	Order date, signup date
`numeric`	Numeric ranges	Age groups, price ranges
`geographic`	Location data	Country, region, city

Dimension Definition

Complete Dimension Example

{
  "name": "customer_age_group",
  "type": "categorical",
  "query": "CASE WHEN age < 25 THEN 'Under 25' WHEN age < 35 THEN '25-34' WHEN age < 45 THEN '35-44' ELSE '45+' END",
  "description": "Customer age categorized into groups",
  "table_alias": "customers",
  "formatting": [
    {
      "name": "title_case",
      "type": "transform",
      "mode": "post_query",
      "function": "title"
    }
  ],
  "sort_order": "asc"
}

Filters

Filters control what data is included in the analysis:

Filter Types

Type	Description	Operators
`string`	Text matching	equals, contains, starts_with, ends_with
`numeric`	Numeric comparison	=, !=, <, <=, >, >=, between
`date`	Date comparison	before, after, between, last_n_days
`boolean`	True/false values	is_true, is_false
`list`	Multiple values	in, not_in

Filter Definition

Advanced Filter Example

{
  "name": "order_status_filter",
  "type": "list",
  "column": "status",
  "description": "Filter by order status",
  "default": ["completed", "shipped"],
  "options": [
    {"value": "pending", "label": "Pending"},
    {"value": "completed", "label": "Completed"},
    {"value": "shipped", "label": "Shipped"},
    {"value": "cancelled", "label": "Cancelled"}
  ],
  "multiple": true,
  "required": false
}

Sorting

Cortex provides sophisticated result ordering through semantic order sequences that work seamlessly with measures, dimensions, and raw columns.

Order Sequence Types

Type	Description	Use Case
`MEASURE`	Order by measure name	Sort by calculated values like revenue, count
`DIMENSION`	Order by dimension name	Sort by categorical values like product category
`COLUMN`	Order by raw table column	Sort by specific database columns
`POSITION`	Order by SELECT position	Sort by position in query results

Order Sequence Definition

Order Sequence Example

{
  "name": "revenue_analysis",
  "description": "Revenue analysis with custom sorting",
  "table_name": "sales_transactions",
  "ordered": true,
  "order": [
    {
      "reference_type": "MEASURE",
      "reference": "total_revenue",
      "order_type": "DESC",
      "nulls_position": "LAST"
    },
    {
      "reference_type": "DIMENSION", 
      "reference": "month",
      "order_type": "ASC",
      "nulls_position": "FIRST"
    }
  ],
  "measures": [
    {
      "name": "total_revenue",
      "type": "sum",
      "query": "amount"
    }
  ],
  "dimensions": [
    {
      "name": "month",
      "query": "DATE_TRUNC('month', sale_date)"
    }
  ]
}

Default Ordering Rules

When no explicit ordering is specified, Cortex applies intelligent default sorting:

First time dimension with granularity (ascending)
First measure (descending)
First dimension (ascending)

Order Types

Type	Description	SQL Equivalent
`ASC`	Ascending order	`ASC`
`DESC`	Descending order	`DESC`

Null Handling

Position	Description	SQL Equivalent
`FIRST`	Nulls first	`NULLS FIRST`
`LAST`	Nulls last	`NULLS LAST`

Output Formatting

Cortex supports advanced data formatting with two modes:

IN_QUERY Formatting

Applied during SQL generation, before query execution:

IN_QUERY Formatting

{
  "name": "date_truncation",
  "type": "function",
  "mode": "in_query",
  "function": "DATE_TRUNC",
  "parameters": ["month", "sale_date"],
  "target_type": "date"
}

POST_QUERY Formatting

Applied to query results after execution:

POST_QUERY Formatting

{
  "name": "currency_formatting",
  "type": "format",
  "mode": "post_query",
  "format_string": "${:,.2f}",
  "locale": "en_US"
}

Built-in Formatters

Formatter	Description	Example
`currency`	Currency formatting	$1,234.56
`percentage`	Percentage display	45.67%
`date`	Date formatting	2024-01-15
`datetime`	Date/time formatting	2024-01-15 14:30:00
`number`	Number formatting	1,234
`round`	Decimal rounding	123.46
`uppercase`	Upper case text	SALES
`lowercase`	Lower case text	sales
`title`	Title case text	Sales

Joins and Relationships

Define how tables relate to each other:

Join Types

Type	Description	Use Case
`inner`	Only matching rows	Required relationships
`left`	All left table rows	Optional relationships
`right`	All right table rows	Less common
`full`	All rows from both tables	Complete data analysis

Join Definition

Join Example

{
  "name": "customer_orders",
  "type": "left",
  "left_table": "customers",
  "right_table": "orders",
  "conditions": [
    {
      "left_column": "customers.customer_id",
      "operator": "=",
      "right_column": "orders.customer_id"
    }
  ]
}

Parameters and Context

Dynamic Parameters

Parameters allow metrics to be customized at runtime:

Parameter Example

{
  "name": "revenue_by_date_range",
  "parameters": [
    {
      "name": "start_date",
      "type": "date",
      "description": "Start date for analysis",
      "default": "2024-01-01",
      "required": true
    },
    {
      "name": "end_date",
      "type": "date",
      "description": "End date for analysis",
      "default": "2024-12-31",
      "required": true
    },
    {
      "name": "min_revenue",
      "type": "numeric",
      "description": "Minimum revenue threshold",
      "default": 100.00,
      "required": false
    }
  ]
}

Consumer Context

Metrics can be personalized based on consumer properties:

Context-Aware Metric

{
  "name": "my_team_performance",
  "consumer_context": {
    "team_filter": {
      "column": "team_id",
      "value_from_context": "consumer.team_id"
    },
    "region_filter": {
      "column": "region",
      "value_from_context": "consumer.region"
    }
  }
}

Metric Dependencies

Complex metrics can depend on other metrics:

Dependent Metric

{
  "name": "revenue_growth_rate",
  "description": "Month-over-month revenue growth",
  "dependencies": [
    {
      "metric": "monthly_revenue",
      "alias": "current_month"
    },
    {
      "metric": "monthly_revenue",
      "alias": "previous_month",
      "parameters": {
        "month_offset": -1
      }
    }
  ],
  "calculation": "(current_month.revenue - previous_month.revenue) / previous_month.revenue * 100"
}

Validation and Testing

Schema Validation

All semantic models are validated against a comprehensive schema:

from cortex.core.semantics.schema import SemanticSchemaValidator
from cortex.core.semantics.metrics.metric import SemanticMetric

# Validate a metric
validator = SemanticSchemaValidator()
result = validator.validate_metric(metric_definition)

if not result.is_valid:
    print("Validation errors:", result.errors)

Best Practices

1. Naming Conventions

Use clear, business-friendly names
Be consistent with terminology
Avoid technical jargon when possible

2. Performance Optimization

Use appropriate aggregation levels
Implement efficient filters
Consider caching for frequently used metrics

3. Documentation

Always include descriptions
Document parameter usage

4. Version Control

Version your semantic models
Track changes over time

5. Security

Validate all inputs
Use parameterized queries
Implement proper access controls

Advanced Features

Custom SQL Functions

For complex calculations that can't be expressed with standard aggregations:

Custom SQL Function

{
  "name": "customer_lifetime_value",
  "type": "custom_sql",
  "query": "SUM(amount * customer_multiplier) OVER (PARTITION BY customer_id)",
  "description": "Calculate customer lifetime value with custom logic"
}

Time-Based Calculations

Built-in support for time intelligence:

Time Intelligence

{
  "name": "year_over_year_growth",
  "type": "time_calculation",
  "base_metric": "monthly_revenue",
  "calculation": "percent_change",
  "time_period": "year"
}

Conditional Logic

Apply different calculations based on conditions:

Conditional Calculation

{
  "name": "tiered_discount",
  "type": "conditional",
  "conditions": [
    {
      "if": "amount > 1000",
      "then": "amount * 0.9"
    },
    {
      "if": "amount > 500",
      "then": "amount * 0.95"
    },
    {
      "else": "amount"
    }
  ]
}

Integration with AI Agents

The semantic layer is designed to work seamlessly with AI agents:

Natural Language Interface

AI agents can interpret user questions and map them to semantic metrics
Structured definitions enable reliable query generation
Business context helps with disambiguation

Automated Discovery

AI can analyze data sources and suggest relevant metrics
Pattern recognition for similar calculations
Automated documentation generation

Intelligent Optimization

AI can optimize query performance based on usage patterns
Suggest new metrics based on user behavior
Identify redundant or unused metrics

The semantic layer serves as the foundation for AI-powered analytics, providing the structure and context needed for intelligent data interactions.

Edit this pageorReport an issue

Overview

High-level system architecture and design principles of Cortex.

Query Engine

Understanding how Cortex translates semantic definitions into optimized database queries.