Architecture

Semantic Layer

Understanding the semantic layer that provides business-friendly data abstractions.

Overview

The semantic layer is the core innovation of Cortex, providing a business-friendly abstraction over raw database complexity. It allows business users and developers to work with meaningful concepts like "monthly revenue" or "customer lifetime value" rather than writing complex SQL queries.

Core Concepts

Semantic Models

A semantic model is a JSON-based definition that describes how business concepts map to your underlying data:

{
  "name": "sales_model",
  "description": "Sales transactions and customer data",
  "version": "1.0.0",
  "tables": [
    {
      "name": "sales_transactions",
      "primary_key": "transaction_id",
      "columns": [
        {"name": "transaction_id", "type": "uuid"},
        {"name": "customer_id", "type": "uuid"},
        {"name": "product_id", "type": "uuid"},
        {"name": "sale_date", "type": "timestamp"},
        {"name": "amount", "type": "decimal"},
        {"name": "quantity", "type": "integer"}
      ]
    },
    {
      "name": "customers",
      "primary_key": "customer_id",
      "columns": [
        {"name": "customer_id", "type": "uuid"},
        {"name": "name", "type": "string"},
        {"name": "signup_date", "type": "date"}
      ]
    }
  ],
  "relationships": [
    {
      "name": "customer_sales",
      "type": "one_to_many",
      "from_table": "customers",
      "to_table": "sales_transactions",
      "from_column": "customer_id",
      "to_column": "customer_id"
    }
  ]
}

Metrics

Metrics are the primary business calculations that users want to perform. Each metric defines:

  • Measures: What to calculate (sum, count, average, etc.)
  • Dimensions: How to group the results
  • Filters: What data to include/exclude
  • Ordering: How to sort the results
  • Formatting: How to display the results
{
  "name": "monthly_revenue",
  "description": "Total revenue aggregated by month",
  "data_model": "sales_model",
  "table_name": "sales_transactions",
  "time_dimension": "sale_date",
  "measures": [
    {
      "name": "revenue",
      "type": "sum",
      "query": "amount",
      "description": "Total sales amount",
      "formatting": [
        {
          "name": "currency_format",
          "type": "format",
          "mode": "post_query",
          "format_string": "${:,.2f}"
        }
      ]
    },
    {
      "name": "transaction_count",
      "type": "count",
      "query": "*",
      "description": "Number of transactions"
    }
  ],
  "dimensions": [
    {
      "name": "month",
      "query": "DATE_TRUNC('month', sale_date)",
      "description": "Month of the sale",
      "formatting": [
        {
          "name": "date_format",
          "type": "cast",
          "mode": "in_query",
          "target_type": "date"
        }
      ]
    },
    {
      "name": "customer_type",
      "query": "customers.customer_type",
      "description": "Type of customer",
      "joins": [
        {
          "table": "customers",
          "condition": "sales_transactions.customer_id = customers.customer_id"
        }
      ]
    }
  ],
  "filters": [
    {
      "name": "date_range",
      "type": "date_range",
      "column": "sale_date",
      "default": "last_30_days"
    },
    {
      "name": "min_amount",
      "type": "numeric",
      "column": "amount",
      "operator": ">=",
      "default": 0
    }
  ]
}

Measures

Measures define the quantitative calculations that can be performed on your data:

Aggregation Types

TypeDescriptionExample
sumSum of valuesTotal revenue, total quantity
countCount of rowsNumber of orders, number of customers
count_distinctCount unique valuesNumber of unique customers
avgAverage of valuesAverage order value
minMinimum valueSmallest order amount
maxMaximum valueLargest order amount
medianMedian valueMiddle value in sorted list

Measure Definition

Complete Measure Example
{
  "name": "average_order_value",
  "type": "avg",
  "query": "amount",
  "description": "Average value of customer orders",
  "filters": [
    {
      "column": "amount",
      "operator": ">",
      "value": 0
    }
  ],
  "formatting": [
    {
      "name": "currency_format",
      "type": "format",
      "mode": "post_query",
      "format_string": "${:,.2f}"
    },
    {
      "name": "round_to_cent",
      "type": "round",
      "mode": "post_query",
      "precision": 2
    }
  ]
}

Dimensions

Dimensions define how data can be grouped, filtered, and organized:

Dimension Types

TypeDescriptionExample
categoricalText or discrete valuesProduct category, customer segment
temporalDate/time valuesOrder date, signup date
numericNumeric rangesAge groups, price ranges
geographicLocation dataCountry, region, city

Dimension Definition

Complete Dimension Example
{
  "name": "customer_age_group",
  "type": "categorical",
  "query": "CASE WHEN age < 25 THEN 'Under 25' WHEN age < 35 THEN '25-34' WHEN age < 45 THEN '35-44' ELSE '45+' END",
  "description": "Customer age categorized into groups",
  "table_alias": "customers",
  "formatting": [
    {
      "name": "title_case",
      "type": "transform",
      "mode": "post_query",
      "function": "title"
    }
  ],
  "sort_order": "asc"
}

Filters

Filters control what data is included in the analysis:

Filter Types

TypeDescriptionOperators
stringText matchingequals, contains, starts_with, ends_with
numericNumeric comparison=, !=, <, <=, >, >=, between
dateDate comparisonbefore, after, between, last_n_days
booleanTrue/false valuesis_true, is_false
listMultiple valuesin, not_in

Filter Definition

Advanced Filter Example
{
  "name": "order_status_filter",
  "type": "list",
  "column": "status",
  "description": "Filter by order status",
  "default": ["completed", "shipped"],
  "options": [
    {"value": "pending", "label": "Pending"},
    {"value": "completed", "label": "Completed"},
    {"value": "shipped", "label": "Shipped"},
    {"value": "cancelled", "label": "Cancelled"}
  ],
  "multiple": true,
  "required": false
}

Sorting

Cortex provides sophisticated result ordering through semantic order sequences that work seamlessly with measures, dimensions, and raw columns.

Order Sequence Types

TypeDescriptionUse Case
MEASUREOrder by measure nameSort by calculated values like revenue, count
DIMENSIONOrder by dimension nameSort by categorical values like product category
COLUMNOrder by raw table columnSort by specific database columns
POSITIONOrder by SELECT positionSort by position in query results

Order Sequence Definition

Order Sequence Example
{
  "name": "revenue_analysis",
  "description": "Revenue analysis with custom sorting",
  "table_name": "sales_transactions",
  "ordered": true,
  "order": [
    {
      "reference_type": "MEASURE",
      "reference": "total_revenue",
      "order_type": "DESC",
      "nulls_position": "LAST"
    },
    {
      "reference_type": "DIMENSION", 
      "reference": "month",
      "order_type": "ASC",
      "nulls_position": "FIRST"
    }
  ],
  "measures": [
    {
      "name": "total_revenue",
      "type": "sum",
      "query": "amount"
    }
  ],
  "dimensions": [
    {
      "name": "month",
      "query": "DATE_TRUNC('month', sale_date)"
    }
  ]
}

Default Ordering Rules

When no explicit ordering is specified, Cortex applies intelligent default sorting:

  1. First time dimension with granularity (ascending)
  2. First measure (descending)
  3. First dimension (ascending)

Order Types

TypeDescriptionSQL Equivalent
ASCAscending orderASC
DESCDescending orderDESC

Null Handling

PositionDescriptionSQL Equivalent
FIRSTNulls firstNULLS FIRST
LASTNulls lastNULLS LAST

Output Formatting

Cortex supports advanced data formatting with two modes:

IN_QUERY Formatting

Applied during SQL generation, before query execution:

IN_QUERY Formatting
{
  "name": "date_truncation",
  "type": "function",
  "mode": "in_query",
  "function": "DATE_TRUNC",
  "parameters": ["month", "sale_date"],
  "target_type": "date"
}

POST_QUERY Formatting

Applied to query results after execution:

POST_QUERY Formatting
{
  "name": "currency_formatting",
  "type": "format",
  "mode": "post_query",
  "format_string": "${:,.2f}",
  "locale": "en_US"
}

Built-in Formatters

FormatterDescriptionExample
currencyCurrency formatting$1,234.56
percentagePercentage display45.67%
dateDate formatting2024-01-15
datetimeDate/time formatting2024-01-15 14:30:00
numberNumber formatting1,234
roundDecimal rounding123.46
uppercaseUpper case textSALES
lowercaseLower case textsales
titleTitle case textSales

Joins and Relationships

Define how tables relate to each other:

Join Types

TypeDescriptionUse Case
innerOnly matching rowsRequired relationships
leftAll left table rowsOptional relationships
rightAll right table rowsLess common
fullAll rows from both tablesComplete data analysis

Join Definition

Join Example
{
  "name": "customer_orders",
  "type": "left",
  "left_table": "customers",
  "right_table": "orders",
  "conditions": [
    {
      "left_column": "customers.customer_id",
      "operator": "=",
      "right_column": "orders.customer_id"
    }
  ]
}

Parameters and Context

Dynamic Parameters

Parameters allow metrics to be customized at runtime:

Parameter Example
{
  "name": "revenue_by_date_range",
  "parameters": [
    {
      "name": "start_date",
      "type": "date",
      "description": "Start date for analysis",
      "default": "2024-01-01",
      "required": true
    },
    {
      "name": "end_date",
      "type": "date",
      "description": "End date for analysis",
      "default": "2024-12-31",
      "required": true
    },
    {
      "name": "min_revenue",
      "type": "numeric",
      "description": "Minimum revenue threshold",
      "default": 100.00,
      "required": false
    }
  ]
}

Consumer Context

Metrics can be personalized based on consumer properties:

Context-Aware Metric
{
  "name": "my_team_performance",
  "consumer_context": {
    "team_filter": {
      "column": "team_id",
      "value_from_context": "consumer.team_id"
    },
    "region_filter": {
      "column": "region",
      "value_from_context": "consumer.region"
    }
  }
}

Metric Dependencies

Complex metrics can depend on other metrics:

Dependent Metric
{
  "name": "revenue_growth_rate",
  "description": "Month-over-month revenue growth",
  "dependencies": [
    {
      "metric": "monthly_revenue",
      "alias": "current_month"
    },
    {
      "metric": "monthly_revenue",
      "alias": "previous_month",
      "parameters": {
        "month_offset": -1
      }
    }
  ],
  "calculation": "(current_month.revenue - previous_month.revenue) / previous_month.revenue * 100"
}

Validation and Testing

Schema Validation

All semantic models are validated against a comprehensive schema:

from cortex.core.semantics.schema import SemanticSchemaValidator
from cortex.core.semantics.metrics.metric import SemanticMetric

# Validate a metric
validator = SemanticSchemaValidator()
result = validator.validate_metric(metric_definition)

if not result.is_valid:
    print("Validation errors:", result.errors)

Best Practices

1. Naming Conventions

  • Use clear, business-friendly names
  • Be consistent with terminology
  • Avoid technical jargon when possible

2. Performance Optimization

  • Use appropriate aggregation levels
  • Implement efficient filters
  • Consider caching for frequently used metrics

3. Documentation

  • Always include descriptions
  • Document parameter usage

4. Version Control

  • Version your semantic models
  • Track changes over time

5. Security

  • Validate all inputs
  • Use parameterized queries
  • Implement proper access controls

Advanced Features

Custom SQL Functions

For complex calculations that can't be expressed with standard aggregations:

Custom SQL Function
{
  "name": "customer_lifetime_value",
  "type": "custom_sql",
  "query": "SUM(amount * customer_multiplier) OVER (PARTITION BY customer_id)",
  "description": "Calculate customer lifetime value with custom logic"
}

Time-Based Calculations

Built-in support for time intelligence:

Time Intelligence
{
  "name": "year_over_year_growth",
  "type": "time_calculation",
  "base_metric": "monthly_revenue",
  "calculation": "percent_change",
  "time_period": "year"
}

Conditional Logic

Apply different calculations based on conditions:

Conditional Calculation
{
  "name": "tiered_discount",
  "type": "conditional",
  "conditions": [
    {
      "if": "amount > 1000",
      "then": "amount * 0.9"
    },
    {
      "if": "amount > 500",
      "then": "amount * 0.95"
    },
    {
      "else": "amount"
    }
  ]
}

Integration with AI Agents

The semantic layer is designed to work seamlessly with AI agents:

Natural Language Interface

  • AI agents can interpret user questions and map them to semantic metrics
  • Structured definitions enable reliable query generation
  • Business context helps with disambiguation

Automated Discovery

  • AI can analyze data sources and suggest relevant metrics
  • Pattern recognition for similar calculations
  • Automated documentation generation

Intelligent Optimization

  • AI can optimize query performance based on usage patterns
  • Suggest new metrics based on user behavior
  • Identify redundant or unused metrics

The semantic layer serves as the foundation for AI-powered analytics, providing the structure and context needed for intelligent data interactions.