I am running Grafana on a EC2 instance in AWS Cloud to visualize data contained in a Timescale database located in the Cloud.
The Grafana dashboard
My Grafana dashboard has 14 panels (8 time series, 3 bar charts, 1 pie chart, 2 stats) and all together it performs 69 SQL queries (including 11 queries done in the definition of the defined Grafana variables). These queries are partially done towards the raw data table and partially towards hour aggregate views. The dashboards contains also 17 data transformations. Most of the tests have been done by using a time range of 1 day (144 data points when using the raw data table and 24 data points when using the hour aggregate view).
Example of query:
SELECT mtimestamp, mvalue AS "internal temperature" FROM periodic_measurements WHERE (apartmentid = $apartment AND roomid = $room_id AND metric = 1 AND $__timeFilter(mtimestamp))
The problem
When I connect the dashboard to my alfa database (which has a main table of about 16 million entries), the dashboard loads in few seconds.
However, when I use the production database (which has over 1.000 million entries) as a data source, the dashboard takes about 40 minutes to load.
The source data tables
The columns that are used in the Grafana query have index since they are all part of a composite primary key.
For example, the raw data table is defined as follows:
CREATE TABLE periodic_measurements (mtimestamp TIMESTAMP WITH TIME ZONE NOT NULL, apartmentid INTEGER NOT NULL, roomid INTEGER NOT NULL, sensorid INTEGER NOT NULL, metric SMALLINT NOT NULL, mvalue NUMERIC NULL, PRIMARY KEY (mtimestamp, apartmentid, roomid, sensorid, metric));
HW and SW configuration
The source database is a Timescale database (PostgreSQL-based) hosted in aiven Cloud service. It has 8 CPUs, 16GB RAM and 512GB of disk space. The server runs Postgres version 14.10. and both databases use Timescale version 2.11.2.
The server where Grafana is running is an EC2 instance in AWS running Linux 22.04. It is a m5a.xlarge instance which has 4vCPUs, 16 GB RAM and network bandwidth up to 10Gbps. The server runs Grafana 9.5.1. (free version).
I am accessing the EC2 Linux machine which hosts Grafana through Remote Desktop Connection from Windows 10 Pro (22H2) by using a connection with over 27Mbps for download and over 5 Mbps for upload.
Test results (source: production database)
During the tests I got the following results from monitoring the two servers (when using the above mentioned HW configuration):
- Timescale server
- CPU usage: 50%
- IOPS read: 1.600
- memory usage: 11%
- EC2 machine
- CPU usage: 38%
- Network in (bytes) - 5 minutes: 5M
- Network out (bytes) - 5 minutes: 210M
Test variants that have been tried
In order to decrease the loading time, I have tried using with different HW configurations, obtaining always the same loading time range. The HW configuration described above is the most capable one among those I have been testing.
Also, I have tried to minimize the amount of separate SQL queries that the dashboard is doing by using common panels but that has not improved the performances at all.
The question
Any hints why my Grafana dashboard becomes so slow to load (more than half an hour) when reading from tables containing thousands of million of entries?
Thanks a lot in advance,
Bernardo Di Chiara