SQL Patterns for Optimizing IoT Queries in TimescaleDB

IoT applications generate massive amounts of time-series data—millions of rows from thousands of sensors, often with unpredictable reporting intervals. As your dataset grows, queries that once took milliseconds can take minutes or more if not properly optimized.
In this post, we’ll explore practical SQL patterns for retrieving the latest sensor readings efficiently, and how to use continuous aggregates for historical analytics—based on real-world deployments using TimescaleDB on PostgreSQL.
Understanding IoT Table Design
How you structure your sensor data impacts everything from ingestion to query performance. Let’s look at two common formats:
1. Narrow Table Format (Flexible + Efficient)
This design works well when:
-
Sensors sample at different times or rates
-
You want to avoid schema changes when adding new sensors
-
You support hysteresis (only recording changes)
An example of a narrow table format is:
Narrow table format:
ts | sensor_id | value |
---|---|---|
2024-10-31 11:17:30.000 | 1007 | 23.45 |
Pair this with a sensor metadata table:
Sensor metadata table:
sensor_id | sensor_name | units |
---|---|---|
1007 | temperature | degreesC |
1012 | heat_mode | on/off |
1013 | cooling_mode | on/off |
1041 | occupancy | number of people in room |
2. Medium-Width Format (Multi-type Flexibility)
Adds columns for different data types (float, int, bool, text, json).
Medium-width table format:
ts | sensor_id | d | i | b | t | j |
---|---|---|---|---|---|---|
2024-10-31 11:17:30.000 | 1007 | 23.45 | null | null | null | null |
2024-10-31 11:17:47.000 | 1012 | null | null | TRUE | null | null |
2024-10-31 11:18:01.000 | 1041 | null | 4 | null | null | null |
Add a constraint to enforce that at least one value is recorded:
CONSTRAINT at_least_one_not_null
CHECK ((d IS NOT NULL) OR (i IS NOT NULL) OR (b IS NOT NULL) OR (j IS NOT NULL) OR (t IS NOT NULL))
Querying the Latest Sensor Values (Efficiently)
The Problem:
Naive queries (like SELECT MAX(ts)) don’t scale well. With 1,000+ sensors and 40M+ rows, full-table scans can take over an hour.
The Goal:
Get the latest reading for each sensor—fast.
- Pattern:
SELECT DISTINCT ON
+ Time Constraint
This is a reliable pattern for dashboards or status displays.
WITH latest_data AS (
SELECT DISTINCT ON (sensor_id) ts, sensor_id, d
FROM iot_data
WHERE d IS NOT NULL
AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week' --
crucial
ORDER BY sensor_id, ts DESC
)
SELECT
sensor_id, sensors.sensor_name, ts, d
FROM latest_data
LEFT JOIN sensors ON latest_data.sensor_id = sensors.sensor_id
ORDER BY sensor_id, ts;
Why it works:
-
DISTINCT ON
gets the most recent row persensor_id
-
Time constraint avoids scanning old or inactive data
-
CTE makes the join cleaner and improves readability
Without the time filter, this query can take >60 minutes on large datasets.
2. Pattern: JOIN LATERAL
for Targeted Subqueries
Great when querying a subset of known sensors.
SELECT sensor_list.sensor_id, latest_data.ts, latest_data.d
FROM sensors sensor_list
LEFT JOIN LATERAL (
SELECT ts, d
FROM iot_data raw_data
WHERE sensor_id = sensor_list.sensor_id
ORDER BY ts DESC
LIMIT 1
) latest_data ON true
WHERE latest_data.d IS NOT NULL
AND latest_data.ts > CURRENT_TIMESTAMP - INTERVAL '1 week'
ORDER BY sensor_list.sensor_id, latest_data.ts;
Why it works:
-
Limits scans to known sensors (from
sensors
table) -
LIMIT 1
exits early once it finds the latest value -
Cleaner for subsets (e.g., by location or device type)
3. Pattern: Continuous Aggregates for Historical Analysis
Timescale’s continuous aggregates precompute and store results—perfect for queries over longer time ranges.
CREATE MATERIALIZED VIEW temperature_daily_summary
WITH (timescaledb.continuous) AS
SELECT
device_id,
time_bucket('1 day', ts) AS day,
AVG(d) AS avg_temp,
MAX(d) AS max_temp,
MIN(d) AS min_temp
FROM iot_data
GROUP BY device_id, day;
Then, set a policy to keep the view fresh:
SELECT add_continuous_aggregate_policy('temperature_daily_summary',
start_offset => INTERVAL '1 month',
end_offset => INTERVAL '1 day',
schedule_interval => INTERVAL '1 hour');
Works seamlessly with compression and tiered storage.
Performance Takeaways
-
Always use time constraints (e.g. last 1h, 1d, 1w)—this avoids unnecessary scans
-
Use sensor metadata as your driving table when possible (it’s smaller)
-
Use
LIMIT 1
,JOIN LATERAL
, orDISTINCT ON
patterns strategically -
For historical views, use continuous aggregates (they’re fast and storage-efficient)
Summary: Choosing the Right Pattern
Pattern | Best For | Notes |
---|---|---|
SELECT DISTINCT ON |
General latest-value queries | Simple, fast with time constraint |
JOIN LATERAL |
Targeted sensor queries | Efficient for subsets |
Continuous Aggregates | Historical analytics | Fast + compressed + refreshable |