Data Models
Overview
Data Models serve as containers for metrics and define the relationships between different analytical components. Each data model is tied to a specific data source and supports versioning, validation, and metadata management. Metrics are stored separately and linked to data models through the data_model_id field.
Overview of how Data Models fit into the Cortex ecosystem
Use Cases
Data Models are used for:
- Business Domain Organization: Group related metrics by business area
- Data Source Management: Manage multiple data sources and their relationships
- Version Control: Track changes and maintain audit trails
- Basic Validation: Ensure data model structure integrity
- Configuration Management: Share common settings across metrics
- Organization: Provide logical grouping for metrics
Core Components
DataModel Class
The DataModel class is the central component that defines the structure and behavior of data models.
Location: cortex/core/data/modelling/model.py
Properties
class DataModel(TSModel):
id: UUID = Field(default_factory=uuid4) # Auto-generated unique identifier
name: str # Human-readable name
alias: Optional[str] = None # Alternative name for APIs
description: Optional[str] = None # Detailed description
version: int = 1 # Version number for change tracking
is_active: bool = True # Whether model is currently active
parent_version_id: Optional[UUID] = None # For version branching
config: Dict[str, Any] = Field(default_factory=dict) # Custom configuration
is_valid: bool = False # Validation status
validation_errors: Optional[List[str]] = None # Validation error messages
created_at: datetime # Creation timestamp
updated_at: datetime # Last update timestamp
Note: Metrics are stored separately in the metrics table and linked to data models via the data_model_id field. Data models no longer contain embedded semantic model JSON.
Model Versioning
Data Models support comprehensive versioning for tracking changes and maintaining audit trails.
Location: cortex/core/data/modelling/model_version.py
class ModelVersion(TSModel):
id: UUID = Field(default_factory=uuid4)
data_model_id: UUID # Parent data model reference
version_number: int # Incremental version number
semantic_model: Dict[str, Any] = Field(default_factory=dict) # Complete model snapshot
is_valid: bool = False # Validation status for this version
validation_errors: Optional[List[str]] = None
compiled_queries: Optional[Dict[str, str]] = None # Cached queries (metric_alias -> query)
description: Optional[str] = None # Change description
created_by: Optional[UUID] = None # User who created version
tags: Optional[List[str]] = None # Categorization tags
config: Dict[str, Any] = Field(default_factory=dict) # Legacy config for compatibility
created_at: datetime # Version creation timestamp
Note: Model versions store complete snapshots of the data model configuration, but metrics are versioned separately through the metrics API.
Services
Validation Service
The ValidationService provides basic validation for data model structure and configuration.
Location: cortex/core/data/modelling/validation_service.py
Key Features:
- Basic Structure Validation: Ensures required fields are present
- Data Source Validation: Verifies data source references
- Version Validation: Checks version number validity
Note: Most validation functionality has been moved to the metric level since metrics are now stored separately. The ValidationService primarily validates basic data model structure.
class ValidationService:
@staticmethod
def validate_data_model(data_model: DataModel) -> ValidationResult:
"""Perform basic validation of a DataModel structure."""
class ValidationResult:
is_valid: bool
errors: List[str]
warnings: List[str]
validated_at: datetime
Metric Service
The MetricService provides utility functions for metric operations, though most functionality has been moved to the metrics API.
Location: cortex/core/data/modelling/metric_service.py
Key Features:
- Extension Resolution: Handle metric inheritance and extensions
- Dependency Management: Track metric dependencies
- Legacy Support: Maintain compatibility with older implementations
Note: Most metric operations are now handled through the dedicated metrics API endpoints. The MetricService primarily provides utility functions for metric extension and dependency resolution.
class MetricService:
@staticmethod
def resolve_metric_extensions(metric: SemanticMetric) -> SemanticMetric:
"""Resolve metric extensions by applying base metric configurations."""
@staticmethod
def get_metric_dependencies(metric: SemanticMetric) -> List[UUID]:
"""Get the list of metric UUIDs that the given metric depends on."""
API Integration
Data Models are fully integrated with Cortex's REST API through dedicated endpoints.
Location: cortex/api/routers/data/models.py
Available Endpoints
POST /data/models- Create a new data modelGET /data/models/{model_id}- Retrieve a specific data modelGET /data/models- List data models with pagination and filteringPUT /data/models/{model_id}- Update an existing data modelDELETE /data/models/{model_id}- Delete a data modelPOST /data/models/{model_id}/validate- Validate a data modelPOST /data/models/{model_id}/execute- Execute a metric from the data model
Request/Response Schemas
The API uses Pydantic models for type validation and automatic documentation:
# Creation Request
DataModelCreateRequest:
name: str
alias: Optional[str]
description: Optional[str]
config: Optional[Dict[str, Any]]
# Response
DataModelResponse:
id: UUID
name: str
alias: Optional[str]
description: Optional[str]
version: int
is_active: bool
parent_version_id: Optional[UUID]
config: Dict[str, Any]
is_valid: bool
validation_errors: Optional[List[str]]
metrics_count: int # Computed field
created_at: datetime
updated_at: datetime
# Model Execution Request
ModelExecutionRequest:
metric_alias: str
parameters: Optional[Dict[str, Any]]
# Model Execution Response
ModelExecutionResponse:
success: bool
data: Optional[List[Dict[str, Any]]]
error: Optional[str]
metadata: Dict[str, Any]
Configuration Examples
Basic Data Model
{
"name": "customer_analytics",
"alias": "customers",
"description": "Customer behavior and revenue analytics",
"config": {
"default_timezone": "UTC",
"cache_enabled": true,
"max_query_timeout": 300
}
}
Advanced Configuration
{
"name": "ecommerce_metrics",
"description": "Comprehensive e-commerce analytics model",
"config": {
"performance": {
"query_timeout": 600,
"max_parallel_queries": 10
},
"caching": {
"default_ttl": 3600,
"warmup_enabled": true
},
"validation": {
"strict_mode": true,
"dependency_check": true
}
}
}
Version Management
Creating Versions
# Automatic versioning on model updates
model.version += 1
model.updated_at = datetime.now(pytz.UTC)
# Create version snapshot
version = ModelVersion(
data_model_id=model.id,
version_number=model.version,
semantic_model=model.to_dict(),
description="Updated customer segmentation logic",
created_by=user_id,
tags=["enhancement", "customer-analytics"],
config=model.config # Legacy compatibility
)
Version History
# Retrieve version history through API
# GET /data/models/{model_id}/versions
versions = api_client.get_model_versions(model_id)
for version in versions:
print(f"Version {version.version_number}: {version.description}")
print(f"Created: {version.created_at}")
print(f"Valid: {version.is_valid}")
Note: Model versions store snapshots of the data model configuration. Individual metrics are versioned separately through the metrics API.
Validation Workflow
Automatic Validation
Data models are automatically validated during:
- Creation: When a new model is created
- Updates: When significant fields are modified
- Manual Validation: Through the
/validateendpoint
Validation Process
# Basic validation workflow
validation_result = ValidationService.validate_data_model(data_model)
# Update model status
data_model.is_valid = validation_result.is_valid
data_model.validation_errors = validation_result.errors if validation_result.errors else None
# Log warnings
if validation_result.warnings:
logger.warning(f"Validation warnings for model {data_model.id}: {validation_result.warnings}")
Note: The ValidationService now primarily validates basic data model structure. Metric validation is handled separately through the metrics API.
Best Practices
Model Organization
- Domain-driven Design: Group metrics by business domain
- Consistent Naming: Use clear, descriptive names
- Documentation: Always include detailed descriptions
- Version Control: Use meaningful version descriptions
Performance Optimization
- Efficient Queries: Optimize metric queries for performance
- Caching Strategy: Configure appropriate cache settings
- Index Management: Ensure proper database indexing
- Batch Operations: Use bulk operations when possible
Validation and Testing
- Regular Validation: Run validation checks regularly
- Test Coverage: Test models with various data scenarios
- Error Handling: Implement proper error handling
- Monitoring: Monitor validation status and performance
Integration with Metrics
Data Models are closely integrated with the metrics system:
- Metric Organization: Metrics belong to data models via
data_model_idfield - Shared Configuration: Models provide common settings for metrics
- Logical Grouping: Models provide logical organization for related metrics
- API Integration: Metrics are accessed through the metrics API with data model filtering
Note: Metrics are stored separately in the metrics table and managed through dedicated API endpoints. Data models serve as organizational containers and configuration sources.
Video Tutorial
Video: Complete guide to setting up and managing data models in Cortex
Dashboard Interface
Screenshot of the data models management interface in the Cortex dashboard
Related Topics
- Metrics Configuration - Learn about configuring metrics within data models
- Semantic Components - Understand the semantic components that power data models
- Data Sources - Connect data models to various data sources
- API Reference - Detailed API documentation for data model operations