1

<TL;DR> How many bytes are being received in total when a SELECT * command is issued for a (small) table?
</TL;DR>

I have a PostgreSQL table - very simple (see fiddle here):

CREATE TABLE test
(
  a INT NOT NULL,
  b TEXT NOT NULL
);

Populated:

INSERT INTO test VALUES
(1, 'afasdfasdf'),
(2, 'afasdfasdafasdfasf'),
(3, 'afasdfasdfasdfasdfsdf'),
(4, 'af'),
(5, 'afasdfa'),
(6, '我爱你');

And running this query

SELECT
  SUM(PG_COLUMN_SIZE(temp.a))     AS int_sz,
  SUM(CHARACTER_LENGTH(temp.b))   AS char_len,
  SUM(OCTET_LENGTH(temp.b))       AS oct_len,
  SUM(OCTET_LENGTH(temp.*::TEXT)) AS oct_len_total
FROM
  (SELECT * FROM test) AS temp;

gives the result:

int_sz  char_len  oct_len  oct_len_total
    24        61       67             91

So, we can see that char_len != oct_len (Chinese characters take 3 bytes each - UTF8) - so the total length in bytes is 91 - fine.

I then use this article to set up the wireshark client to monitor psql (the PostgreSQL command line client) traffic. I first start up the database and login so that this traffic (handshaking, db startup messages &c.) won't appear in the wireshark log. I then run (as root in a different terminal window):

tshark -i any -f 'tcp port 5432' -d tcp.port==5432,pgsql -O pgsql

and in the psql window, issue the command (I should mention at this point that I'm not sure if the Frames are coming from the client or the server, but for my purposes, it's not that important - I just wish to know the total amount of data transmitted):

SELECT * FROM test;

and the following appears in the wireshark log:

Frame 1: 93 bytes on wire (744 bits), 93 bytes captured (744 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 192.168.234.75, Dst: 192.168.234.75
Transmission Control Protocol, Src Port: 30050, Dst Port: 5432, Seq: 1, Ack: 1, Len: 25
PostgreSQL
    Type: Simple query
    Length: 24
    Query: SELECT * FROM test;

Now the query string itself is 19 bytes long. However, I have:

  • 93 bytes on wire
  • Len 25
  • Length 24

What is the total amount of data (I'm pretty sure - transmitted by the client) in this case - what is the meaning of the different numbers?

After Frame 1, we have Frame 2 in the wireshark log:

Frame 2 is composed of 2 parts, a Header and then row data for the individual rows (fairly sure this is received by the client and transmitted by the server).

Header:

Frame 2: 298 bytes on wire (2384 bits), 298 bytes captured (2384 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 192.168.234.75, Dst: 192.168.234.75
Transmission Control Protocol, Src Port: 5432, Dst Port: 30050, Seq: 1, Ack: 26, Len: 230
PostgreSQL
    Type: Row description
    Length: 46
    Field count: 2
        Column name: a
            Table OID: 24583
            Column index: 1
            Type OID: 23
            Column length: 4      -- The length of an INT - 4 bytes.
            Type modifier: -1
            Format: Text (0)
        Column name: b
            Table OID: 24583
            Column index: 2
            Type OID: 25
            Column length: -1  -- means it's a TEXT field with no specified length
            Type modifier: -1
            Format: Text (0)

So, the header has 298 bytes on wire, and Len: 230 - so, this appears to be a summary of the total data received by the client from the server. Is this correct?

And is the total equal to 298 or 230 - as compared to the 91 in the table itself? An explanation of the numbers would be appreciated.

Two rows of data:

First row:

PostgreSQL
    Type: Data row
    Length: 25
    Field count: 2
        Column length: 1   -- an INT (i.e. test.a) is 4 bytes - why 1?
        Data: 31
        Column length: 10   -- corresponds exactly to the size of the TEXT column (i.e. test.b in fiddle) in bytes
        Data: 61666173646661736466

Length: 25, Data: 31 and Column Length: 1, the Length of the second Data: is 20 characters?

What do all these numbers mean?

Second row:

PostgreSQL
    Type: Data row
    Length: 33
    Field count: 2
        Column length: 1
        Data: 32
        Column length: 18   -- corresponds exactly to the size of the column (t.b) in bytes
        Data: 616661736466617364616661736466617366
...
... snipped for brevity - see bottom of post for the other 4 records.
...

Same question as for row 1 - what, exactly, is being received?

At the end of Frame 2, we have the following (cleaning up/handshaking?):

PostgreSQL
    Type: Command completion
    Length: 13
    Tag: SELECT 6

Server telling client, it's done transmitting data? And also:

PostgreSQL
    Type: Ready for query
    Length: 5
    Status: Idle (73)

Server telling client, it's ready to do more work? and this Frame 3 section appears after every command:

Frame 3: 68 bytes on wire (544 bits), 68 bytes captured (544 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 192.168.234.75, Dst: 192.168.234.75
Transmission Control Protocol, Src Port: 30050, Dst Port: 5432, Seq: 26, Ack: 231, Len: 0

So, is the total amount of data received by the client for running the SELECT command - is it the sum of the bytes on wire for the 3 Frames? Or is it a different figure?

Any explanations, pointers to resources, URL's, reference &c. appreciated.

====================== 4 records not included in wireshark log extract above ===========

PostgreSQL
    Type: Data row
    Length: 36
    Field count: 2
        Column length: 1
        Data: 33
        Column length: 21
        Data: 616661736466617364666173646661736466736466
PostgreSQL
    Type: Data row
    Length: 17
    Field count: 2
        Column length: 1
        Data: 34
        Column length: 2
        Data: 6166
PostgreSQL
    Type: Data row
    Length: 22
    Field count: 2
        Column length: 1
        Data: 35 
        Column length: 7
        Data: 61666173646661
PostgreSQL
    Type: Data row
    Length: 24
    Field count: 2
        Column length: 1
        Data: 36
        Column length: 9   -- 3 Chinese character = 9 bytes, i.e. 3 per character
        Data: e68891e788b1e4bda0
4
  • Any chance the person who downvoted this question could be kind enough to explain why they downvoted it? Thanks...
    – Vérace
    Commented Mar 5 at 2:53
  • superuser.com/help/dont-ask "there is no actual problem to be solved" and there are 13 questionmarks in your question. I'll vote to close "This question currently includes multiple questions in one. It should focus on one problem only."
    – Gantendo
    Commented Mar 5 at 3:33
  • @Gantendo - thanks for your input and, yes, you are correct. I'll rephrase the question to include only one question mark! Well, two - one in the <TL;DR> section and one in the body of the question proper! Appreciate you taking the trouble to point out the error of my ways - and it's funny, because I've frequently VtC for the same reason (I rarely, if ever, downvote - I think 3 times in over a decade on the SE sites).
    – Vérace
    Commented Mar 5 at 3:39
  • Pistol shrimp > Mantis shrimp
    – Gantendo
    Commented Mar 5 at 3:50

1 Answer 1

1

What is the total amount of data (I'm pretty sure - transmitted by the client) in this case - what is the meaning of the different numbers?

The meaning of different numbers is the payload size at each layer. You have a Postgres protocol packet indicating its own payload length (a tag/length/payload format with apparently a 1-byte "type" tag header that's not counted), and several of those packets go inside a TCP segment which indicates its own payload length, and that TCP segment (with its header) goes inside an IP packet that indicates its own length, and that IP packet goes inside an Ethernet frame that Wireshark indicates the full length of.

Ethernet frame [93]
 ├─ Ethernet header [always 14]
 └─ Payload: IPv4 packet [79]
     ├─ IPv4 header [usually 20]
     └─ Payload: TCP segment [59]
         ├─ TCP segment header [34? (20 + options)]
         └─ Payload: Postgres data [25]
             ├─ Postgres packet [25]
             │   ├─ Tag field (packet type) [1]
             │   ├─ Length field       ┐ (range of the
             │   └─ Data field         ┘  "Length" field)
             ├─ (same TCP segment could have than one Postgres packet)
             └─ (etc)

and in the psql window, issue the command (I should mention at this point that I'm not sure if the Frames are coming from the client or the server, but for my purposes, it's not that important - I just wish to know the total amount of data transmitted):

Technically it is 'frame length' that tells you the amount of bytes ultimately received by the client computer, but frame overhead varies hop-by-hop as the same IP packet travels across different link types (e.g. a Wi-Fi-connected client will produce a different-length frame than an Ethernet-connected server will receive).

Only the frame payload – the IP packet – stays constant along the entire path, so it's usually the latter that makes more practical sense to measure.

Wireshark does not report the IP packet (frame payload) sizes in its summaries, and it doesn't seem to have a field that would give you that directly, so you have to specifically ask it for the ip.len and ipv6.plen fields. (These two have an important difference, however; ip.len for IPv4 includes the IP header itself while ipv6.plen for IPv6 does not, so you must add a fixed 40 bytes to the latter.)

For example, to get individual fields (ip.version included to allow calculating the full length for IPv6):

tshark -i wlan0 -T fields -e ip.version -e ip.len -e ipv6.plen

Column length: 1 -- an INT (i.e. test.a) is 4 bytes - why 1?

The packet format is not required to be the same as the storage format. If the value fits in 1 byte, Postgres will apparently transfer it as 1 byte – it's still the same integer either way, whether you include the leading zeros or not.

    Column length: 10   -- corresponds exactly to the size of the TEXT column (i.e. test.b in fiddle) in bytes
   Data: 61666173646661736466

Length: 25, Data: 31 and Column Length: 1, the Length of the second Data: is 20 characters?

Wireshark is showing you 20 hexadecimal digits, not 20 characters. One hex digit represents four bits, so each two digits represent a single byte, giving you 10 bytes of actual data.

The "packet tree" output of Wireshark is meant for readability, not for precision. I would suggest at this point loading the capture into the graphical Wireshark, so that you could click on each of the fields in the packet tree and see its corresponding bytes highlighted in the "Raw bytes" pane – this would tell you whether e.g. the "Column length" itself takes 1 byte or 2 or 4, or whether the "Data" field is ASCII or hex bytes or an integer or something else.

8
  • Thank you so much for your input - looks great and merits a good bit of consideration - I have to go out now, but I will look at your post later. Just one supplementary question - I'm totally reworking the question to answer @Gantendo's point about there being too many questions in one. If you could comment on the fiddle here (last snippet) - is 459 the total number of bytes that went in or out of the client network card for this particular SELECT?
    – Vérace
    Commented Mar 5 at 6:23
  • I performed a further query (SELECT test.* FROM GENERATE_SERIES(1, 10), test;) - I'm trying to answer the point about the network protocol not being negligible made by mustaccio in that question's thread! The bytes_on_wire was proportionately reduced by ~ half, which I would expect for a protocol. I expect that for a 1000 records, the proportion would be reduced still further?
    – Vérace
    Commented Mar 5 at 6:24
  • "Bytes on wire" is the total number of bytes that went in or out of the network card at the specific interface the capture was made. (Which, again, is variable at different points of the path, which is why the IP packet size is often what's actually measured or accounted, disregarding the frame-level overhead.) Yes, the smaller your payload gets, the higher the relative overhead becomes, as the header sizes usually stay fixed for a given connection. Given a full-size IP packet (Internet MTU is 1500 bytes), the generally expected TCP/IP overhead is around 8% or 10%. Commented Mar 5 at 6:40
  • Note though that at least some of your captures in the initial post did not have the actual network frames – if you run the capture on the any interface, what you get is a "Linux cooked mode" pseudo-header that does not represent the underlying network header at all (as the older pcap format couldn't represent different types in a single capture), in which case the "bytes on wire" will be slightly off. Commented Mar 5 at 6:45
  • Oh, Gawd... I had no idea this was all so complex! I guess that's why network engineers get the big bucks? So, if it's Windows or *BSD, the answsers could be different? Final question, by what percentage approx. +/-? I'll study your answer later, but I'm going to mark it as correct - you've given me more than enough to fill in the gaps in my answer on dba.se. I kinda thought data_2 was hex, because if you look at record 6 - the Chinese characters, there are letters, none of which go above 'e' - anyway, thanks again and I'll upvote later...
    – Vérace
    Commented Mar 5 at 8:31

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .