Time-Series Databases

Introduction to Time-Series Databases (TSDB)

This article introduces the definition and origins of time-series databases, compares them with traditional relational databases, and outlines their typical application scenarios.

Core Concepts

A Time-Series Database (TSDB) is a database optimized for handling time-series data. Time-series data consists of a sequence of objective data points with timestamps, typically generated continuously in chronological order.

Why Do We Need Specialized TSDBs?

While traditional relational databases (such as MySQL or PostgreSQL) can store timestamped data, they often encounter performance bottlenecks when dealing with the unique characteristics of time-series data:

Extremely High Write Throughput: IoT sensors or server clusters may generate millions of data points per second.
Append-Only Nature: Time-series data records historical facts that have already occurred; it is typically append-only, with very few updates or deletions.
Time-Range Query Requirements: The most common query is “get the average CPU usage over the past hour,” rather than looking up a specific individual record.

Typical Application Scenarios

IT Infrastructure Monitoring: Monitoring CPU, memory, and network traffic for server clusters (e.g., Prometheus).
Internet of Things (IoT): Collecting data from sensor devices, such as temperature, humidity, and vehicle trajectories.
Financial Quantitative Analysis: Recording tick-by-tick market data, candlestick charts, etc.

Mainstream TSDBs

InfluxDB: The most popular open-source TSDB with a robust ecosystem.
Prometheus: The standard for cloud-native monitoring, featuring built-in TSDB capabilities.
TimescaleDB: Built on top of PostgreSQL, offering full SQL support.

graph TD
  A[Data Sources] --> B(High-concurrency Append Writes)
  A1[IoT Sensors] --> A
  A2[App Monitoring] --> A
  A3[Financial Trading] --> A
  B --> C{Time-Series Database (TSDB)}
  C --> D(Time-range Aggregation Queries)
  D --> E[Data Visualization/Grafana]
  D --> F[Alerting Systems]

References:

Time-Series Data Models and Multi-Dimensional Queries

An in-depth analysis of the internal structure of time-series data, including timestamps, metrics, tags, and data fields, to help you understand its multi-dimensional data model.

Basic Structure of Time-Series Data

Although terminology varies across databases, a typical time-series Data Point usually contains these four core dimensions:

Timestamp: The absolute time of the data. It is the primary key dimension for time-series data, usually precise to the millisecond or even nanosecond.
Metric/Measurement: Describes the data category, such as cpu_usage or temperature.
Tags/Labels: Key-value pairs used to describe metadata about the data source. Tags are typically indexed to allow for fast filtering and grouped calculations, e.g., host=server01, region=cn-hangzhou.
Fields/Values: The actual measured values. Fields are usually not indexed and can be numeric (integers, floats), boolean, or strings, e.g., value=85.5.

Multi-Dimensional Data Model

While traditional databases often use flat table structures, modern time-series databases (like Prometheus and InfluxDB) use a multi-dimensional data model. This means that a single Metric combined with different sets of Tags constitutes a unique Time Series.

Example Data Structure:

# InfluxDB Line Protocol format
measurement,tag_set field_set timestamp
cpu_load,host=server01,region=cn value=0.64 1629811200000000000
cpu_load,host=server02,region=cn value=0.55 1629811200000000000

In this example:

cpu_load is the Metric.
host=server01, region=cn are Tags (indexed).
value=0.64 is the Field (the actual value).
1629811200... is the Timestamp.

Why Distinguish Between Tags and Fields?

The core reason is indexing cost. Databases build inverted indexes for Tags, making queries like WHERE host='server01' extremely fast. However, if you use constantly changing random values (like specific CPU load values) as Tags, it leads to an explosion in the number of indexes—the well-known High Cardinality problem.

graph TD
  A[Data Point] --> B(Timestamp)
  A --> C(Metric/Measurement)
  A --> D(Tags/Labels - Indexed)
  A --> E(Fields/Values - Not Indexed)
  D --> D1[host=web-server-1]
  D --> D2[env=production]
  E --> E1[cpu_usage=85.2]
  E --> E2[memory_usage=1024]

References:

Time-Series Queries and Aggregation Functions

Exploring the query languages and operational patterns unique to TSDBs, with a focus on time-window-based aggregation analysis.

Core Characteristics of Time-Series Queries

The query patterns of TSDBs differ fundamentally from relational databases; the vast majority of queries are time-range-based aggregation analyses. Users typically do not care about a data point at a specific millisecond, but rather the “overall trend over a period of time.”

Time Windows

When analyzing continuous timelines, we typically divide data into different time windows for aggregation:

Tumbling Window: Fixed-size, non-overlapping windows. For example, calculating the average CPU usage per minute (00:00-00:01, 00:01-00:02).
Sliding Window: Fixed-size, overlapping windows. For example, calculating the average of the past minute every 10 seconds.

Common Aggregation Functions

In addition to standard SUM, AVG, MAX, and MIN, time-series analysis frequently uses specialized mathematical functions:

Percentiles (e.g., P99, P95): Crucial in monitoring systems. A P99 latency of 100ms means 99% of requests complete within 100ms; it reflects long-tail effects better than the average.
Rate/Derivative: Calculates the growth rate of a metric over time. For example, by taking the derivative of a cumulative total (Counter type), we can derive the current real-time bandwidth.

Query Language Examples

PromQL (Prometheus Query Language) Example: Get the average HTTP 500 error rate for all web servers over the past 5 minutes:

rate(http_requests_total{job="web", status="500"}[5m])

SQL Extension Example (TimescaleDB): Align data into 1-hour buckets (Tumbling Window) and calculate the maximum temperature:

SELECT 
  time_bucket('1 hour', time) AS bucket,
  MAX(temperature) 
FROM sensor_data 
GROUP BY bucket 
ORDER BY bucket DESC;

flowchart LR
  A[Raw Data Points] -->|One point per sec| B(Time Window Partitioning)
  B -->|Tumbling Window 1m| C[Aggregation Calculation]
  C --> D1[AVG/MIN/MAX]
  C --> D2[P99 / P95]
  C --> D3[Rate of Change]
  D1 --> E[Smoothed Trend Graph]
  D2 --> E
  D3 --> E

References:

Underlying Storage Engines and High Compression

An analysis of the core design of TSDB storage architectures (such as LSM Trees) and data compression algorithms, aimed at handling massive writes and reducing storage costs.

Why Not B+ Trees?

Traditional relational databases often use B+ Trees as their storage engine. While B+ Trees are suitable for random reads and writes, in time-series scenarios, the continuous influx of data causes frequent node splitting, leading to significant Write Amplification and random disk I/O, which drastically degrades write performance.

LSM Trees and TSM Trees

Modern time-series databases generally adopt architectures based on LSM Trees (Log-Structured Merge-Trees):

WAL (Write Ahead Log): Incoming data is first appended to a log to ensure no data loss during crashes.
MemTable / Memory Cache: Data is then written to memory. Once a threshold is reached, it is flushed to disk in batches, converting random writes into high-speed sequential writes.
SSTable / TSM Files: Read-only data files after flushing. The background periodically performs Compaction to merge small files and clean up expired data.

(Note: InfluxDB customized the LSM Tree for time-series scenarios, evolving into the specialized TSM (Time-Structured Merge Tree) architecture.)

Columnar Storage and High Compression

Time-series data is highly suitable for columnar storage. Since data in the same column has the same type and changes gradually over time, this provides a foundation for efficient compression:

Delta-of-Delta Compression (for timestamps): Calculates the difference between adjacent timestamps, then the difference of those differences. The result is often 0 or a very small integer, drastically reducing storage space.
Gorilla Algorithm (for floating-point numbers): An algorithm open-sourced by Facebook that compares adjacent floating-point values using XOR operations, significantly lowering storage overhead.

graph TD
  A[Client Write Request] --> B[WAL]
  A --> C[Memory: MemTable]
  C -->|Threshold Reached: Flush| D[Disk: SSTable / TSM File Level 1]
  D -->|Background Compaction| E[Disk: TSM File Level 2]
  E -->|Background Compaction| F[Disk: TSM File Level 3]
  style C fill:#f9f,stroke:#333,stroke-width:2px
  style B fill:#ffd,stroke:#333

References:

Lifecycle Management: Downsampling and Retention

Exploring how to balance long-term storage costs with query precision through Downsampling and Retention Policies.

The Dilemma of Infinite Data Growth

Time-series data is characterized by continuous and infinite generation. If all monitoring data is stored at 1-second precision, it would consume hundreds of terabytes in a few months. However, the value of data decays rapidly over time:

Past 1 hour: Requires second-level precision for troubleshooting.
1 month ago: Only hour-level precision is needed for trend analysis.
1 year ago: Only day-level precision is needed for long-term capacity planning.

Downsampling (Continuous Queries)

Downsampling is an automated mechanism that periodically reads high-precision data, uses aggregation functions to convert it into lower-precision data, and writes it to a new table.

Typical Downsampling Workflow:

Collect raw data at 1-second precision and store it in the raw_data table.
Configure a background task: Calculate the average every 5 minutes and write the 1-minute precision data to the data_1m table.
Every hour, calculate aggregate values based on data_1m and write them to the data_1h table.

Retention Policy (RP)

Combined with downsampling, TSDBs provide mechanisms to automatically delete expired data. Since time-series data is stored in Time Shards, expired data deletion does not require inefficient DELETE statements; instead, it directly drops the underlying expired shard files, incurring minimal overhead.

Example Lifecycle Strategy:

Policy A: raw_data (second-level) kept for 7 days.
Policy B: data_1m (minute-level) kept for 30 days.
Policy C: data_1h (hour-level) kept for 1 year. Through this mechanism, databases can save over 90% of storage space without impacting macro-trend analysis.

References:

Distributed Architecture and the High Cardinality Challenge

An in-depth look at cluster architectures, sharding strategies, and how to tackle the ultimate challenge: “High Cardinality.”

Distributed Sharding Strategy

When single-node capacity cannot support massive data, TSDBs must scale into clusters. Sharding typically uses a dual-dimensional strategy:

Time Sharding: Each shard stores data for a specific time span (e.g., one day), facilitating lifecycle management and narrowing scan ranges.
Hash/Tag Sharding: Data for the same time period is hashed to different nodes based on Tags (e.g., device ID) to achieve load balancing.

The Core Challenge: High Cardinality

Cardinality refers to the number of unique elements in a set. In time-series data, the total number of time series = the total number of all indexed tag combinations.

Disastrous High Cardinality Scenarios: If unbounded values like IP addresses, random UUIDs, or container IDs are included in Tags, the total number of time series can explode exponentially (into the hundreds of millions or billions).

Impact of High Cardinality:

Inverted Index Bloat: The memory cannot hold the massive index dictionary, leading to severe Out-Of-Memory (OOM) errors.
Extremely Slow Queries: The system must scan thousands of fragmented series to merge data.

Optimization Strategies

Strict Data Modeling: Prohibit putting unbounded random values into Tags (they should be placed in Fields that are not indexed).
Inverted Index Optimization: For example, Prometheus borrows from full-text search inverted structures, and InfluxDB introduced the TSI (Time Series Index) architecture to offload massive memory indexes to disk.
Separating Compute and Storage: Next-generation TSDBs (like VictoriaMetrics, InfluxDB IOx) offload storage to object stores like S3, making compute nodes stateless and using Parquet columnar formats to handle high-cardinality analysis.

graph TD
  A[Routing/Proxy Layer] --> B(Hash: device_id=A)
  A --> C(Hash: device_id=B)
  B --> D[Node 1: Yesterday's Data Slice]
  B --> E[Node 1: Today's Data Slice]
  C --> F[Node 2: Yesterday's Data Slice]
  C --> G[Node 2: Today's Data Slice]
  style D fill:#f9f,stroke:#333
  style F fill:#f9f,stroke:#333

References: