Core Components

Data Models

Foundation of Cortex's analytical architecture, representing collections of related metrics with shared configuration.

Overview

Data Models serve as containers for metrics and define the relationships between different analytical components. Each data model is tied to a specific data source and supports versioning, validation, and metadata management. Metrics are stored separately and linked to data models through the data_model_id field.

Overview of how Data Models fit into the Cortex ecosystem

Use Cases

Data Models are used for:

  • Business Domain Organization: Group related metrics by business area
  • Data Source Management: Manage multiple data sources and their relationships
  • Version Control: Track changes and maintain audit trails
  • Basic Validation: Ensure data model structure integrity
  • Configuration Management: Share common settings across metrics
  • Organization: Provide logical grouping for metrics

Core Components

DataModel Class

The DataModel class is the central component that defines the structure and behavior of data models.

Location: cortex/core/data/modelling/model.py

Properties

class DataModel(TSModel):
    id: UUID = Field(default_factory=uuid4)          # Auto-generated unique identifier
    name: str                                        # Human-readable name
    alias: Optional[str] = None                      # Alternative name for APIs
    description: Optional[str] = None                # Detailed description
    version: int = 1                                # Version number for change tracking
    is_active: bool = True                          # Whether model is currently active
    parent_version_id: Optional[UUID] = None        # For version branching
    config: Dict[str, Any] = Field(default_factory=dict)  # Custom configuration
    is_valid: bool = False                          # Validation status
    validation_errors: Optional[List[str]] = None   # Validation error messages
    created_at: datetime                            # Creation timestamp
    updated_at: datetime                            # Last update timestamp

Note: Metrics are stored separately in the metrics table and linked to data models via the data_model_id field. Data models no longer contain embedded semantic model JSON.

Model Versioning

Data Models support comprehensive versioning for tracking changes and maintaining audit trails.

Location: cortex/core/data/modelling/model_version.py

class ModelVersion(TSModel):
    id: UUID = Field(default_factory=uuid4)
    data_model_id: UUID                          # Parent data model reference
    version_number: int                          # Incremental version number
    semantic_model: Dict[str, Any] = Field(default_factory=dict)  # Complete model snapshot
    is_valid: bool = False                       # Validation status for this version
    validation_errors: Optional[List[str]] = None
    compiled_queries: Optional[Dict[str, str]] = None  # Cached queries (metric_alias -> query)
    description: Optional[str] = None           # Change description
    created_by: Optional[UUID] = None           # User who created version
    tags: Optional[List[str]] = None            # Categorization tags
    config: Dict[str, Any] = Field(default_factory=dict)  # Legacy config for compatibility
    created_at: datetime                        # Version creation timestamp

Note: Model versions store complete snapshots of the data model configuration, but metrics are versioned separately through the metrics API.

Services

Validation Service

The ValidationService provides basic validation for data model structure and configuration.

Location: cortex/core/data/modelling/validation_service.py

Key Features:

  • Basic Structure Validation: Ensures required fields are present
  • Data Source Validation: Verifies data source references
  • Version Validation: Checks version number validity

Note: Most validation functionality has been moved to the metric level since metrics are now stored separately. The ValidationService primarily validates basic data model structure.

class ValidationService:
    @staticmethod
    def validate_data_model(data_model: DataModel) -> ValidationResult:
        """Perform basic validation of a DataModel structure."""

class ValidationResult:
    is_valid: bool
    errors: List[str]
    warnings: List[str]
    validated_at: datetime

Metric Service

The MetricService provides utility functions for metric operations, though most functionality has been moved to the metrics API.

Location: cortex/core/data/modelling/metric_service.py

Key Features:

  • Extension Resolution: Handle metric inheritance and extensions
  • Dependency Management: Track metric dependencies
  • Legacy Support: Maintain compatibility with older implementations

Note: Most metric operations are now handled through the dedicated metrics API endpoints. The MetricService primarily provides utility functions for metric extension and dependency resolution.

class MetricService:
    @staticmethod
    def resolve_metric_extensions(metric: SemanticMetric) -> SemanticMetric:
        """Resolve metric extensions by applying base metric configurations."""

    @staticmethod
    def get_metric_dependencies(metric: SemanticMetric) -> List[UUID]:
        """Get the list of metric UUIDs that the given metric depends on."""

API Integration

Data Models are fully integrated with Cortex's REST API through dedicated endpoints.

Location: cortex/api/routers/data/models.py

Available Endpoints

  • POST /data/models - Create a new data model
  • GET /data/models/{model_id} - Retrieve a specific data model
  • GET /data/models - List data models with pagination and filtering
  • PUT /data/models/{model_id} - Update an existing data model
  • DELETE /data/models/{model_id} - Delete a data model
  • POST /data/models/{model_id}/validate - Validate a data model
  • POST /data/models/{model_id}/execute - Execute a metric from the data model

Request/Response Schemas

The API uses Pydantic models for type validation and automatic documentation:

# Creation Request
DataModelCreateRequest:
    name: str
    alias: Optional[str]
    description: Optional[str]
    config: Optional[Dict[str, Any]]

# Response
DataModelResponse:
    id: UUID
    name: str
    alias: Optional[str]
    description: Optional[str]
    version: int
    is_active: bool
    parent_version_id: Optional[UUID]
    config: Dict[str, Any]
    is_valid: bool
    validation_errors: Optional[List[str]]
    metrics_count: int  # Computed field
    created_at: datetime
    updated_at: datetime

# Model Execution Request
ModelExecutionRequest:
    metric_alias: str
    parameters: Optional[Dict[str, Any]]

# Model Execution Response
ModelExecutionResponse:
    success: bool
    data: Optional[List[Dict[str, Any]]]
    error: Optional[str]
    metadata: Dict[str, Any]

Configuration Examples

Basic Data Model

{
    "name": "customer_analytics",
    "alias": "customers",
    "description": "Customer behavior and revenue analytics",
    "config": {
        "default_timezone": "UTC",
        "cache_enabled": true,
        "max_query_timeout": 300
    }
}

Advanced Configuration

{
    "name": "ecommerce_metrics",
    "description": "Comprehensive e-commerce analytics model",
    "config": {
        "performance": {
            "query_timeout": 600,
            "max_parallel_queries": 10
        },
        "caching": {
            "default_ttl": 3600,
            "warmup_enabled": true
        },
        "validation": {
            "strict_mode": true,
            "dependency_check": true
        }
    }
}

Version Management

Creating Versions

# Automatic versioning on model updates
model.version += 1
model.updated_at = datetime.now(pytz.UTC)

# Create version snapshot
version = ModelVersion(
    data_model_id=model.id,
    version_number=model.version,
    semantic_model=model.to_dict(),
    description="Updated customer segmentation logic",
    created_by=user_id,
    tags=["enhancement", "customer-analytics"],
    config=model.config  # Legacy compatibility
)

Version History

# Retrieve version history through API
# GET /data/models/{model_id}/versions
versions = api_client.get_model_versions(model_id)
for version in versions:
    print(f"Version {version.version_number}: {version.description}")
    print(f"Created: {version.created_at}")
    print(f"Valid: {version.is_valid}")

Note: Model versions store snapshots of the data model configuration. Individual metrics are versioned separately through the metrics API.

Validation Workflow

Automatic Validation

Data models are automatically validated during:

  • Creation: When a new model is created
  • Updates: When significant fields are modified
  • Manual Validation: Through the /validate endpoint

Validation Process

# Basic validation workflow
validation_result = ValidationService.validate_data_model(data_model)

# Update model status
data_model.is_valid = validation_result.is_valid
data_model.validation_errors = validation_result.errors if validation_result.errors else None

# Log warnings
if validation_result.warnings:
    logger.warning(f"Validation warnings for model {data_model.id}: {validation_result.warnings}")

Note: The ValidationService now primarily validates basic data model structure. Metric validation is handled separately through the metrics API.

Best Practices

Model Organization

  1. Domain-driven Design: Group metrics by business domain
  2. Consistent Naming: Use clear, descriptive names
  3. Documentation: Always include detailed descriptions
  4. Version Control: Use meaningful version descriptions

Performance Optimization

  1. Efficient Queries: Optimize metric queries for performance
  2. Caching Strategy: Configure appropriate cache settings
  3. Index Management: Ensure proper database indexing
  4. Batch Operations: Use bulk operations when possible

Validation and Testing

  1. Regular Validation: Run validation checks regularly
  2. Test Coverage: Test models with various data scenarios
  3. Error Handling: Implement proper error handling
  4. Monitoring: Monitor validation status and performance

Integration with Metrics

Data Models are closely integrated with the metrics system:

  • Metric Organization: Metrics belong to data models via data_model_id field
  • Shared Configuration: Models provide common settings for metrics
  • Logical Grouping: Models provide logical organization for related metrics
  • API Integration: Metrics are accessed through the metrics API with data model filtering

Note: Metrics are stored separately in the metrics table and managed through dedicated API endpoints. Data models serve as organizational containers and configuration sources.

Video Tutorial

Video: Complete guide to setting up and managing data models in Cortex

Dashboard Interface

Screenshot of the data models management interface in the Cortex dashboard