⚙️ SQL Patterns for Optimizing IoT Queries in TimescaleDB

⚙️ SQL Patterns for Optimizing IoT Queries in TimescaleDB


IoT applications generate massive amounts of time-series data—millions of rows from thousands of sensors, often with unpredictable reporting intervals. As your dataset grows, queries that once took milliseconds can take minutes or more if not properly optimized.

In this post, we’ll explore practical SQL patterns for retrieving the latest sensor readings efficiently, and how to use continuous aggregates for historical analytics—based on real-world deployments using TimescaleDB on PostgreSQL.



Understanding IoT Table Design

How you structure your sensor data impacts everything from ingestion to query performance. Let’s look at two common formats:

1. Narrow Table Format (Flexible + Efficient)

This design works well when:

  • Sensors sample at different times or rates

  • You want to avoid schema changes when adding new sensors

  • You support hysteresis (only recording changes)

An example of a narrow table format is:

Narrow table format:

ts sensor_id value
2024-10-31 11:17:30.000 1007 23.45

💡 Pair this with a sensor metadata table:

Sensor metadata table:

sensor_id sensor_name units
1007 temperature degreesC
1012 heat_mode on/off
1013 cooling_mode on/off
1041 occupancy number of people in room

2. Medium-Width Format (Multi-type Flexibility)

Adds columns for different data types (float, int, bool, text, json).

Medium-width table format:

ts sensor_id d i b t j
2024-10-31 11:17:30.000 1007 23.45 null null null null
2024-10-31 11:17:47.000 1012 null null TRUE null null
2024-10-31 11:18:01.000 1041 null 4 null null null

Add a constraint to enforce that at least one value is recorded:

CONSTRAINT at_least_one_not_null
  CHECK ((d IS NOT NULL) OR (i IS NOT NULL) OR (b IS NOT NULL) OR (j IS NOT NULL) OR (t IS NOT NULL))
Enter fullscreen mode

Exit fullscreen mode



Querying the Latest Sensor Values (Efficiently)

🚨 The Problem:
Naive queries (like SELECT MAX(ts)) don’t scale well. With 1,000+ sensors and 40M+ rows, full-table scans can take over an hour.

✅ The Goal:
Get the latest reading for each sensor—fast.

  1. Pattern: SELECT DISTINCT ON + Time Constraint
    This is a reliable pattern for dashboards or status displays.
WITH latest_data AS (
  SELECT DISTINCT ON (sensor_id) ts, sensor_id, d
  FROM iot_data
  WHERE d IS NOT NULL
    AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week'  -- ✅ crucial
  ORDER BY sensor_id, ts DESC
)
SELECT
  sensor_id, sensors.sensor_name, ts, d
FROM latest_data
LEFT JOIN sensors ON latest_data.sensor_id = sensors.sensor_id
ORDER BY sensor_id, ts;
Enter fullscreen mode

Exit fullscreen mode

💡 Why it works:

  • DISTINCT ON gets the most recent row per sensor_id

  • Time constraint avoids scanning old or inactive data

  • CTE makes the join cleaner and improves readability

📉 Without the time filter, this query can take >60 minutes on large datasets.

2. Pattern: JOIN LATERAL for Targeted Subqueries
Great when querying a subset of known sensors
.

SELECT sensor_list.sensor_id, latest_data.ts, latest_data.d
FROM sensors sensor_list
LEFT JOIN LATERAL (
  SELECT ts, d
  FROM iot_data raw_data
  WHERE sensor_id = sensor_list.sensor_id
  ORDER BY ts DESC
  LIMIT 1
) latest_data ON true
WHERE latest_data.d IS NOT NULL
  AND latest_data.ts > CURRENT_TIMESTAMP - INTERVAL '1 week'
ORDER BY sensor_list.sensor_id, latest_data.ts;
Enter fullscreen mode

Exit fullscreen mode

💡 Why it works:

  • Limits scans to known sensors (from sensors table)

  • LIMIT 1 exits early once it finds the latest value

  • Cleaner for subsets (e.g., by location or device type)

3. Pattern: Continuous Aggregates for Historical Analysis

Timescale’s continuous aggregates precompute and store results—perfect for queries over longer time ranges.

CREATE MATERIALIZED VIEW temperature_daily_summary
WITH (timescaledb.continuous) AS
SELECT
  device_id,
  time_bucket('1 day', ts) AS day,
  AVG(d) AS avg_temp,
  MAX(d) AS max_temp,
  MIN(d) AS min_temp
FROM iot_data
GROUP BY device_id, day;
Enter fullscreen mode

Exit fullscreen mode

Then, set a policy to keep the view fresh:

SELECT add_continuous_aggregate_policy('temperature_daily_summary',
  start_offset => INTERVAL '1 month',
  end_offset => INTERVAL '1 day',
  schedule_interval => INTERVAL '1 hour');
Enter fullscreen mode

Exit fullscreen mode

Works seamlessly with compression and tiered storage.



Performance Takeaways

  • ✅ Always use time constraints (e.g. last 1h, 1d, 1w)—this avoids unnecessary scans

  • ✅ Use sensor metadata as your driving table when possible (it’s smaller)

  • ✅ Use LIMIT 1, JOIN LATERAL, or DISTINCT ON patterns strategically

  • ✅ For historical views, use continuous aggregates (they’re fast and storage-efficient)



Summary: Choosing the Right Pattern

Pattern Best For Notes
SELECT DISTINCT ON General latest-value queries Simple, fast with time constraint
JOIN LATERAL Targeted sensor queries Efficient for subsets
Continuous Aggregates Historical analytics Fast + compressed + refreshable



🔗 Learn More




Try it Yourself



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *