<TL;DR>
How many bytes are being received in total when a SELECT *
command is issued for a (small) table?
</TL;DR>
I have a PostgreSQL table - very simple (see fiddle here):
CREATE TABLE test
(
a INT NOT NULL,
b TEXT NOT NULL
);
Populated:
INSERT INTO test VALUES
(1, 'afasdfasdf'),
(2, 'afasdfasdafasdfasf'),
(3, 'afasdfasdfasdfasdfsdf'),
(4, 'af'),
(5, 'afasdfa'),
(6, '我爱你');
And running this query
SELECT
SUM(PG_COLUMN_SIZE(temp.a)) AS int_sz,
SUM(CHARACTER_LENGTH(temp.b)) AS char_len,
SUM(OCTET_LENGTH(temp.b)) AS oct_len,
SUM(OCTET_LENGTH(temp.*::TEXT)) AS oct_len_total
FROM
(SELECT * FROM test) AS temp;
gives the result:
int_sz char_len oct_len oct_len_total
24 61 67 91
So, we can see that char_len
!= oct_len
(Chinese characters take 3 bytes each - UTF8) - so the total length in bytes is 91 - fine.
I then use this article to set up the wireshark client to monitor psql
(the PostgreSQL command line client) traffic. I first start up the database and login so that this traffic (handshaking, db startup messages &c.) won't appear in the wireshark log. I then run (as root in a different terminal window):
tshark -i any -f 'tcp port 5432' -d tcp.port==5432,pgsql -O pgsql
and in the psql
window, issue the command (I should mention at this point that I'm not sure if the Frame
s are coming from the client or the server, but for my purposes, it's not that important - I just wish to know the total amount of data transmitted):
SELECT * FROM test;
and the following appears in the wireshark log:
Frame 1: 93 bytes on wire (744 bits), 93 bytes captured (744 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 192.168.234.75, Dst: 192.168.234.75
Transmission Control Protocol, Src Port: 30050, Dst Port: 5432, Seq: 1, Ack: 1, Len: 25
PostgreSQL
Type: Simple query
Length: 24
Query: SELECT * FROM test;
Now the query string itself is 19 bytes long. However, I have:
- 93 bytes on wire
- Len 25
- Length 24
What is the total amount of data (I'm pretty sure - transmitted by the client) in this case - what is the meaning of the different numbers?
After Frame 1
, we have Frame 2
in the wireshark log:
Frame 2
is composed of 2 parts, a Header and then row data for the individual rows (fairly sure this is received by the client and transmitted by the server).
Header:
Frame 2: 298 bytes on wire (2384 bits), 298 bytes captured (2384 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 192.168.234.75, Dst: 192.168.234.75
Transmission Control Protocol, Src Port: 5432, Dst Port: 30050, Seq: 1, Ack: 26, Len: 230
PostgreSQL
Type: Row description
Length: 46
Field count: 2
Column name: a
Table OID: 24583
Column index: 1
Type OID: 23
Column length: 4 -- The length of an INT - 4 bytes.
Type modifier: -1
Format: Text (0)
Column name: b
Table OID: 24583
Column index: 2
Type OID: 25
Column length: -1 -- means it's a TEXT field with no specified length
Type modifier: -1
Format: Text (0)
So, the header has 298 bytes on wire
, and Len: 230
- so, this appears to be a summary of the total data received by the client from the server. Is this correct?
And is the total equal to 298 or 230 - as compared to the 91 in the table itself? An explanation of the numbers would be appreciated.
Two rows of data:
First row:
PostgreSQL
Type: Data row
Length: 25
Field count: 2
Column length: 1 -- an INT (i.e. test.a) is 4 bytes - why 1?
Data: 31
Column length: 10 -- corresponds exactly to the size of the TEXT column (i.e. test.b in fiddle) in bytes
Data: 61666173646661736466
Length:
25, Data:
31 and Column Length:
1, the Length of the second Data:
is 20 characters?
What do all these numbers mean?
Second row:
PostgreSQL
Type: Data row
Length: 33
Field count: 2
Column length: 1
Data: 32
Column length: 18 -- corresponds exactly to the size of the column (t.b) in bytes
Data: 616661736466617364616661736466617366
...
... snipped for brevity - see bottom of post for the other 4 records.
...
Same question as for row 1 - what, exactly, is being received?
At the end of Frame 2, we have the following (cleaning up/handshaking?):
PostgreSQL
Type: Command completion
Length: 13
Tag: SELECT 6
Server telling client, it's done transmitting data? And also:
PostgreSQL
Type: Ready for query
Length: 5
Status: Idle (73)
Server telling client, it's ready to do more work? and this Frame 3
section appears after every command:
Frame 3: 68 bytes on wire (544 bits), 68 bytes captured (544 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 192.168.234.75, Dst: 192.168.234.75
Transmission Control Protocol, Src Port: 30050, Dst Port: 5432, Seq: 26, Ack: 231, Len: 0
So, is the total amount of data received by the client for running the SELECT
command - is it the sum of the bytes on wire
for the 3 Frames? Or is it a different figure?
Any explanations, pointers to resources, URL's, reference &c. appreciated.
====================== 4 records not included in wireshark log extract above ===========
PostgreSQL
Type: Data row
Length: 36
Field count: 2
Column length: 1
Data: 33
Column length: 21
Data: 616661736466617364666173646661736466736466
PostgreSQL
Type: Data row
Length: 17
Field count: 2
Column length: 1
Data: 34
Column length: 2
Data: 6166
PostgreSQL
Type: Data row
Length: 22
Field count: 2
Column length: 1
Data: 35
Column length: 7
Data: 61666173646661
PostgreSQL
Type: Data row
Length: 24
Field count: 2
Column length: 1
Data: 36
Column length: 9 -- 3 Chinese character = 9 bytes, i.e. 3 per character
Data: e68891e788b1e4bda0