From the Xterm Control Sequences doc:
Normal tracking mode sends an escape sequence on both button press and release. Modifier key (shift, ctrl, meta) information is also sent. It is enabled by specifying parameter 1000 to DECSET. On button press or release, xterm sends CSI M CbCxCy.
The low two bits of Cb encode button information: 0=MB1 pressed, 1=MB2 pressed, 2=MB3 pressed, 3=release.
The next three bits encode the modifiers which were down when the button was pressed and are added together: 4=Shift, 8=Meta, 16=Control. Note however that the shift and control bits are normally unavailable because xterm uses the control modifier with mouse for popup menus, and the shift modifier is used in the default translations for button events. The Meta modifier recognized by xterm is the mod1 mask, and is not necessarily the "Meta" key (see xmodmap(1)).
Cx and Cy are the x and y coordinates of the mouse event, encoded as in X10 mode.
What happens when for example, left click is pressed with Shift at (1, 1), right click is pressed with Control at (2, 2), left click is released at (3, 3), and right click is released at (4, 4)?
Wouldn't you get
ESC [ M 0000100 ! !
ESC [ M 0010001 " "
ESC [ M 0000011 # #
ESC [ M 0000011 $ $
in your stdin? How should this be handled on the client program? How could it?
The control sequences documentation is incredibly hard to read which makes this unnecessarily difficult.