Refresh
Overview
Refresh policies determine when cached or pre-aggregated data should be refreshed. They balance data freshness with performance by automatically updating data based on time intervals, data changes, or custom conditions.
Diagram showing different refresh strategies and their triggers
Use Cases
Refresh policies are used for:
- Data Freshness: Ensure analytics reflect the latest data
- Performance Optimization: Balance data freshness with query performance
- Resource Management: Control when expensive refresh operations occur
- Automated Maintenance: Reduce manual intervention for data updates
- Cost Optimization: Minimize unnecessary refresh operations
Syntax
Type: RefreshPolicy
Pre-aggregation refresh policy for materialized views and rollups.
{
"refresh": {
"type": "every",
"every": "1 hour"
}
}
Specification
type
Type: RefreshType
Required: Yes
Refresh strategy. Determines how the system decides when to refresh data.
Available Values:
every: Time-based refresh intervalssql: Custom SQL condition for refreshmax: Refresh based on maximum value changes
{
"type": "every"
}
every
Type: Optional[str]
Default: None
Required: No
Time interval string for time-based refresh. Specifies how often to refresh the data.
{
"every": "1 hour"
}
sql
Type: Optional[str]
Default: None
Required: No
Custom SQL query to determine if refresh is needed. The query should return a boolean indicating whether refresh is required.
{
"sql": "SELECT COUNT(*) FROM orders WHERE created_at > last_refresh"
}
max
Type: Optional[str]
Default: None
Required: No
Column name to monitor for maximum value changes. Refreshes when the maximum value of this column changes.
{
"max": "updated_at"
}
Strategies
Time-based Refresh
Hourly Refresh
{
"refresh": {
"type": "every",
"every": "1 hour"
}
}
Daily Refresh
{
"refresh": {
"type": "every",
"every": "1 day"
}
}
Custom Intervals
{
"refresh": {
"type": "every",
"every": "30 minutes"
}
}
Data-based Refresh
SQL Condition Refresh
{
"refresh": {
"type": "sql",
"sql": "SELECT COUNT(*) > 0 FROM orders WHERE created_at > (SELECT MAX(last_refresh) FROM refresh_log WHERE metric_id = 'user_revenue_analysis')"
}
}
Maximum Value Refresh
{
"refresh": {
"type": "max",
"max": "updated_at"
}
}
Hybrid Refresh Strategies
Time + Data Condition
{
"refresh": {
"type": "sql",
"sql": "SELECT COUNT(*) > 0 FROM orders WHERE created_at > COALESCE((SELECT MAX(last_refresh) FROM refresh_log WHERE metric_id = 'user_revenue_analysis'), '1900-01-01') AND created_at > NOW() - INTERVAL '1 hour'"
}
}
Advanced Patterns
Conditional Refresh Based on Data Volume
{
"refresh": {
"type": "sql",
"sql": "SELECT COUNT(*) > 1000 FROM orders WHERE created_at > (SELECT MAX(last_refresh) FROM refresh_log WHERE metric_id = 'user_revenue_analysis')"
}
}
Refresh Based on Business Hours
{
"refresh": {
"type": "sql",
"sql": "SELECT COUNT(*) > 0 FROM orders WHERE created_at > (SELECT MAX(last_refresh) FROM refresh_log WHERE metric_id = 'user_revenue_analysis') AND EXTRACT(HOUR FROM NOW()) BETWEEN 9 AND 17"
}
}
Refresh Based on Data Quality
{
"refresh": {
"type": "sql",
"sql": "SELECT COUNT(*) > 0 FROM data_quality_checks WHERE status = 'failed' AND check_time > (SELECT MAX(last_refresh) FROM refresh_log WHERE metric_id = 'user_revenue_analysis')"
}
}
Scheduling
Cron-based Scheduling
{
"refresh": {
"type": "every",
"every": "0 0 * * *", # Daily at midnight
"timezone": "UTC"
}
}
Business Hours Scheduling
{
"refresh": {
"type": "every",
"every": "0 */2 * * 1-5", # Every 2 hours, Monday to Friday
"timezone": "America/New_York"
}
}
Peak Hours Avoidance
{
"refresh": {
"type": "every",
"every": "0 2 * * *", # Daily at 2 AM
"timezone": "UTC"
}
}
Monitoring
Refresh Status Tracking
{
"refresh": {
"type": "every",
"every": "1 hour",
"monitoring": {
"track_status": true,
"alert_on_failure": true
}
}
}
Refresh Performance Metrics
{
"refresh": {
"type": "every",
"every": "1 hour",
"monitoring": {
"track_duration": true,
"slow_refresh_threshold": "5 minutes"
}
}
}
Refresh Success Rate
{
"refresh": {
"type": "every",
"every": "1 hour",
"monitoring": {
"track_success_rate": true,
"alert_threshold": 0.95
}
}
}
Optimization
Incremental Refresh
{
"refresh": {
"type": "sql",
"sql": "SELECT COUNT(*) > 0 FROM orders WHERE created_at > (SELECT MAX(last_refresh) FROM refresh_log WHERE metric_id = 'user_revenue_analysis')",
"strategy": "incremental"
}
}
Parallel Refresh
{
"refresh": {
"type": "every",
"every": "1 hour",
"parallel": true,
"max_parallel_refreshes": 3
}
}
Refresh Queuing
{
"refresh": {
"type": "every",
"every": "1 hour",
"queue_refreshes": true,
"max_queue_size": 10
}
}
Patterns
Real-time Metrics
{
"refresh": {
"type": "every",
"every": "1 minute"
}
}
Daily Reports
{
"refresh": {
"type": "every",
"every": "0 1 * * *" # Daily at 1 AM
}
}
Weekly Analytics
{
"refresh": {
"type": "every",
"every": "0 2 * * 1" # Weekly on Monday at 2 AM
}
}
Monthly Summaries
{
"refresh": {
"type": "every",
"every": "0 3 1 * *" # Monthly on the 1st at 3 AM
}
}
Best Practices
Choose Appropriate Intervals
- Real-time data: 1-5 minutes
- Frequently updated data: 15-60 minutes
- Standard business data: 1-4 hours
- Historical data: Daily or weekly
Consider Data Volume
{
"refresh": {
"type": "sql",
"sql": "SELECT COUNT(*) > 10000 FROM orders WHERE created_at > (SELECT MAX(last_refresh) FROM refresh_log WHERE metric_id = 'user_revenue_analysis')"
}
}
Monitor Refresh Performance
- Track refresh duration
- Monitor success rates
- Alert on failures
- Optimize refresh queries
Use Conditional Refresh
{
"refresh": {
"type": "sql",
"sql": "SELECT COUNT(*) > 0 FROM orders WHERE created_at > (SELECT MAX(last_refresh) FROM refresh_log WHERE metric_id = 'user_revenue_analysis') AND EXTRACT(HOUR FROM NOW()) BETWEEN 2 AND 6"
}
}
Troubleshooting
Refresh Failures
- Check SQL syntax
- Verify table permissions
- Monitor resource usage
- Review error logs
Slow Refreshes
- Optimize refresh queries
- Consider incremental refresh
- Use parallel processing
- Monitor database performance
Data Staleness
- Reduce refresh intervals
- Use data-based triggers
- Implement manual refresh
- Monitor data changes
Video Tutorial
Video: How to configure refresh policies for optimal data freshness in the Cortex dashboard
Monitoring Dashboard
Screenshot of the refresh monitoring and management interface
Examples
E-commerce Dashboard
{
"refresh": {
"type": "every",
"every": "15 minutes",
"description": "E-commerce data changes frequently"
}
}
Financial Reports
{
"refresh": {
"type": "every",
"every": "0 2 * * *", # Daily at 2 AM
"description": "Financial reports are generated daily"
}
}
User Analytics
{
"refresh": {
"type": "sql",
"sql": "SELECT COUNT(*) > 0 FROM user_events WHERE created_at > (SELECT MAX(last_refresh) FROM refresh_log WHERE metric_id = 'user_analytics')",
"description": "Refresh when new user events are available"
}
}
Related Topics
- Cache Configuration - How caching works with refresh policies
- Measures Configuration - Complete metric setup guide