Name: Azimutt Inspector
Rating: 5 (1 reviews)
Author: Azimutt

Database performance degrades slowly, then suddenly. One day everything is fine, the next day your application is crawling. The culprit? Usually queries that gradually became slower as your data grew. Let's explore how pg_stat_statements helps you catch these issues before your users do.

What is pg_stat_statements?

pg_stat_statements is PostgreSQL's built-in query performance tracker. It records execution statistics for all SQL statements, giving you insights into:

Which queries consume the most time
How often queries are executed
Query performance variability
Cache efficiency
Resource consumption patterns

Setting Up pg_stat_statements

First, you need to enable it (requires database restart):

-- In postgresql.conf
shared_preload_libraries = 'pg_stat_statements'

-- Optional configuration
pg_stat_statements.max = 10000  -- Maximum number of statements tracked
pg_stat_statements.track = 'all'  -- Track all statements including functions
pg_stat_statements.save = on  -- Save stats across server restarts

Then create the extension:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

The Essential Queries Every DBA Should Know

1. Find Your Most Expensive Queries

SELECT round(total_exec_time::numeric, 2)                                        AS total_time_ms
     , calls
     , round(mean_exec_time::numeric, 2)                                         AS mean_time_ms
     , round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percentage_cpu
     , regexp_replace(query, E'[\\n\\r ]+', ' ', 'g')                            AS query
     , queryid
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

This shows which queries consume the most total time, your optimization targets to offload your database.

2. Identify Queries with High Variation

Queries with high standard deviation often indicate missing indexes or data skew:

SELECT calls
     , round(mean_exec_time::numeric, 2)                      AS mean_ms
     , round(stddev_exec_time::numeric, 2)                    AS stddev_ms
     , round((stddev_exec_time / mean_exec_time)::numeric, 2) AS coefficient_of_variation
     , regexp_replace(query, E'[\\n\\r ]+', ' ', 'g')         AS query
     , queryid
FROM pg_stat_statements
WHERE calls > 100         -- Enough calls for meaningful statistics
  AND mean_exec_time > 10 -- Ignore very fast queries
ORDER BY coefficient_of_variation DESC
LIMIT 20;

3. Find Queries That Return Too Many Rows

Often indicates missing WHERE clauses or inefficient joins:

SELECT calls
     , rows
     , round(rows::numeric / calls, 2)                      AS rows_per_call
     , round(mean_exec_time::numeric, 2)                    AS mean_time_ms
     , round((mean_exec_time / (rows / calls))::numeric, 3) AS ms_per_row
     , regexp_replace(query, E'[\\n\\r ]+', ' ', 'g')       AS query
     , queryid
FROM pg_stat_statements
WHERE rows / calls > 1000 -- Returns more than 1000 rows on average
ORDER BY rows_per_call DESC
LIMIT 20;

4. Monitor Cache Hit Ratio

Poor cache performance indicates you might need to increase shared_buffers:

SELECT sum(blk_read_time)    AS total_read_time
     , sum(blk_write_time)   AS total_write_time
     , sum(shared_blks_hit)  AS cache_hits
     , sum(shared_blks_read) AS cache_misses
     , round((sum(shared_blks_hit)::numeric / (sum(shared_blks_hit) + sum(shared_blks_read))), 4) AS cache_hit_ratio
FROM pg_stat_statements
WHERE shared_blks_hit + shared_blks_read > 0;

Aim for a cache hit ratio above 0.90 (90%).

5. Track Query Performance Over Time

Create a snapshot table to track trends:

-- Create snapshot table
CREATE TABLE pg_stat_statements_snapshots AS
SELECT now() AS snapshot_time, *
FROM pg_stat_statements
WHERE false;  -- Create structure only

-- Take regular snapshots (schedule this)
INSERT INTO pg_stat_statements_snapshots
SELECT now(), * FROM pg_stat_statements;

-- Analyze performance trends
WITH query_trends AS (
  SELECT date_trunc('hour', snapshot_time)           AS hour
       , queryid
       , max(calls) - min(calls)                     AS calls_in_hour
       , max(total_exec_time) - min(total_exec_time) AS time_in_hour
  FROM pg_stat_statements_snapshots
  WHERE snapshot_time > now() - interval '24 hours'
  GROUP BY hour, queryid
)
SELECT hour
     , sum(calls_in_hour)                          AS total_calls
     , round(sum(time_in_hour)::numeric / 1000, 2) AS total_seconds
FROM query_trends
GROUP BY hour
ORDER BY hour DESC;

Advanced Analysis Techniques

Finding Queries That Got Slower

Compare current performance with historical baselines:

WITH current_stats AS (
  SELECT queryid
       , mean_exec_time AS current_mean
       , calls          AS current_calls
       , query
  FROM pg_stat_statements
  WHERE calls > 100
),
historical_stats AS (
  SELECT queryid, avg(mean_exec_time) AS historical_mean
  FROM pg_stat_statements_snapshots
  WHERE snapshot_time BETWEEN now() - interval '7 days' AND now() - interval '1 day'
  GROUP BY queryid
)
SELECT c.queryid
     , round(c.current_mean::numeric, 2)                                                   AS current_mean_ms
     , round(h.historical_mean::numeric, 2)                                                AS historical_mean_ms
     , round(((c.current_mean - h.historical_mean) / h.historical_mean * 100)::numeric, 2) AS percent_slower
     , c.current_calls
     , substring(c.query, 1, 60)                                                           AS query_preview
FROM current_stats c
  JOIN historical_stats h ON c.queryid = h.queryid
WHERE c.current_mean > h.historical_mean * 1.5  -- At least 50% slower
ORDER BY percent_slower DESC;

Identifying Index Opportunities

Queries with high read ratios often benefit from indexes:

SELECT queryid
     , calls
     , shared_blks_read + local_blks_read AS blocks_read
     , shared_blks_hit + local_blks_hit   AS blocks_from_cache
     , round(
            ((shared_blks_read + local_blks_read)::numeric /
             NULLIF(shared_blks_read + local_blks_read + shared_blks_hit + local_blks_hit, 0) * 100),
            2
       )                                  AS read_ratio
     , round(mean_exec_time::numeric, 2)  AS mean_time_ms
     , substring(query, 1, 60)            AS query_preview
FROM pg_stat_statements
WHERE calls > 10 AND (shared_blks_read + local_blks_read) > 1000  -- Significant reads
ORDER BY blocks_read DESC
LIMIT 20;

Creating a Performance Dashboard

Build views for easy monitoring:

-- Overall health metrics
CREATE VIEW database_health AS
SELECT (SELECT count(*) FROM pg_stat_statements)                                            AS total_unique_queries,
       (SELECT sum(calls) FROM pg_stat_statements)                                          AS total_query_calls,
       (SELECT round(sum(total_exec_time)::numeric / 1000 / 60, 2) FROM pg_stat_statements) AS total_minutes,
       (SELECT round(avg(mean_exec_time)::numeric, 2) FROM pg_stat_statements)              AS avg_query_time_ms,
       (SELECT round((sum(shared_blks_hit)::numeric / (sum(shared_blks_hit) + sum(shared_blks_read)))::numeric, 4)
        FROM pg_stat_statements
        WHERE shared_blks_hit + shared_blks_read > 0)                                       AS cache_hit_ratio,
       (SELECT count(*) FROM pg_stat_statements WHERE mean_exec_time > 1000)                AS slow_queries_count;

-- Top resource consumers
CREATE VIEW top_queries AS
SELECT row_number() OVER (ORDER BY total_exec_time DESC)                         AS rank
     , round(total_exec_time::numeric / 1000, 2)                                 AS total_seconds
     , calls
     , round(mean_exec_time::numeric, 2)                                         AS mean_ms
     , round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percent_of_total
     , query
FROM pg_stat_statements
LIMIT 50;

-- Recent problem queries
CREATE VIEW problem_queries AS
SELECT calls
     , round(mean_exec_time::numeric, 2) AS mean_ms
     , round(max_exec_time::numeric, 2)  AS max_ms
     , rows
     , query
     , queryid
FROM pg_stat_statements
WHERE (mean_exec_time > 500  -- Slow
   OR (stddev_exec_time / NULLIF(mean_exec_time, 0)) > 2  -- High variation
   OR (rows / NULLIF(calls, 0)) > 10000)  -- Too many rows
ORDER BY total_exec_time DESC;

Best Practices and Tips

1. Regular Maintenance

-- Reset statistics periodically (e.g., after deployments, setting changes or schema changes)
SELECT pg_stat_statements_reset();

-- Or reset for specific queries
SELECT pg_stat_statements_reset(userid, dbid, queryid);

2. Combine with EXPLAIN ANALYZE

When you identify a slow query, analyze it:

-- Get the actual query text
SELECT query FROM pg_stat_statements WHERE queryid = 12345678901234567;

-- Then explain it
EXPLAIN (ANALYZE, BUFFERS) <query_text>;

Common Pitfalls

Not enabling track_io: Without I/O statistics, you miss cache performance data
Setting pg_stat_statements.max too low: You might miss important queries
Not resetting after schema changes: Old statistics can be misleading
Ignoring high-frequency queries: Even fast queries add up when called millions of times
Not tracking query patterns: Individual query optimization misses systemic issues

Action Items

Based on pg_stat_statements data, here's your optimization checklist:

Queries with total_time > 10% of total: These are your bottlenecks
mean_exec_time > 1000ms: Need immediate optimization
stddev/mean > 2: Likely missing indexes or data skew
rows/calls > 1000: Consider pagination or filtering
Cache hit ratio < 0.90: Increase shared_buffers
Sudden performance degradation: Check for plan changes or data growth

Next Steps

While pg_stat_statements shows you what's slow, it doesn't tell you why or how to fix it. The next post will explore how modern monitoring is broken and why we need smarter solutions.

For automated performance analysis and optimization suggestions, check out Azimutt Inspector, which continuously analyzes these metrics and provides actionable recommendations.

Need Performance Help?

Struggling with slow queries or need help interpreting pg_stat_statements data? I offer a free 30-minute consultation session where we can analyze your specific performance challenges and find solutions.

📅 Book a free discussion slot - Let's optimize your database performance together!

Remember

"Performance problems are like icebergs - by the time users see them, 90% of the damage is already done. pg_stat_statements helps you see below the surface."

Monitor proactively, optimize continuously, and sleep peacefully.

pg_stat_statements: The Secret to PostgreSQL Performance