Home:ALL Converter>Optimize the PostgreSQL query

Optimize the PostgreSQL query

Ask Time:2022-12-07T15:47:45         Author:Thomas Young

Json Formatter

I have a table likes the below.

id       session_id     start_time                   answer_time
1          111          2022-12-06 13:40:50          2022-12-06 13:40:55 
2          111          2022-12-06 13:40:51          Null
3          111          2022-12-06 13:40:57          Null
4          222          2022-12-06 13:40:58          Null
5          222          2022-12-06 13:41:10          Null 
6          222          2022-12-06 13:41:10          Null    
7          333          2022-12-06 13:46:10         2022-12-06 13:46:15
8          333          2022-12-06 13:46:18         2022-12-06 13:46:20

There are three sessions in the table, with session ids 111, 222, and 333; Each session has multiple records, but the session_id is the same; and the session is successful or unsuccessfulis depends on answer_time is Null or not of the smallest id record of that session.

The id 1 and id 4 and id 7 records in the above sample table determine whether a session is successful or unsuccessful.

I have the below SQL to query it, and it works well.

WITH t AS
(
    SELECT DISTINCT ON (session_id)  start_time, answer_time
    FROM logs
            WHERE ((SELECT NOW() AT TIME ZONE 'UTC') - start_time < interval '24 HOURS')
    ORDER BY logs.session_id, id
)
SELECT 
       COUNT(*) FILTER (WHERE (answer_time IS NOT NULL)) AS sccess_count,
       COUNT(*) FILTER (WHERE (answer_time IS NULL)) AS fail_count
FROM t;

But if the DB table have about 50M records, the query taken 20 seconds, this is unacceptable in the production environment, how can I optimize it? My goal is less than 1 second for the 50M records.

Edit, the below is the table SQL file:

/*
 Navicat PostgreSQL Data Transfer

 Source Server         : Test Server
 Source Server Type    : PostgreSQL
 Source Server Version : 140004 (140004)
 Source Host           : localhost:5832
 Source Catalog        : pserver1
 Source Schema         : public

 Target Server Type    : PostgreSQL
 Target Server Version : 140004 (140004)
 File Encoding         : 65001

 Date: 07/12/2022 19:53:07
*/


-- ----------------------------
-- Table structure for logs
-- ----------------------------
DROP TABLE IF EXISTS "public"."logs";
CREATE TABLE "public"."logs" (
  "id" int8 NOT NULL,
  "company_id" int8 NOT NULL,
  "session_id" int8 NOT NULL,
  "start_time" timestamp(6) NOT NULL,
  "answer_time" timestamp(6)
)
;

-- ----------------------------
-- Indexes structure for table logs
-- ----------------------------
CREATE INDEX "calllog_start_time_idx_copy1" ON "public"."logs" USING btree (
  "start_time" "pg_catalog"."timestamp_ops" DESC NULLS FIRST
);
CREATE INDEX "idx_calllog_session_id_copy1" ON "public"."logs" USING btree (
  "session_id" "pg_catalog"."int8_ops" ASC NULLS LAST
);

-- ----------------------------
-- Triggers structure for table logs
-- ----------------------------
CREATE TRIGGER "ts_insert_blocker" BEFORE INSERT ON "public"."logs"
FOR EACH ROW
EXECUTE PROCEDURE "_timescaledb_internal"."insert_blocker"();

-- ----------------------------
-- Primary Key structure for table logs
-- ----------------------------
ALTER TABLE "public"."logs" ADD CONSTRAINT "calllog_copy1_pkey" PRIMARY KEY ("id", "start_time");

Author:Thomas Young,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/74713305/optimize-the-postgresql-query
yy