Regarding external ESD protection: it depends on your requirements. You don't mention them, so we can't answer that. Human body model is for humans sticking their fingers in some connector. There are other forms of transients though.
In very demanding environments (some automotive, aerospace, military etc) it is custom with external TVS on the CAN lines. For general industrial applications, it is probably a bit overkill. But it also depends on what the transceiver itself can handle.
Regarding the CAN transceiver you propose, it is old and shouldn't be used in new designs.
The quality of these parts can often be noted with how large a voltage span they can handle on the CANH/CANL pins. The MAX3051 specifies -7.5V to +12.5V absolute maximum. Compare this with a modern part such as MCP2562FD, which is rated at -58V to +58V DC voltage on the same pins, and transients of -150V to 100V. That's a huge difference!
ESD wise they are about the same, with MCP2562FD specifying +/-14kV on CAN lines and lower on other pins, whereas the Maxim part doesn't say which ones (all or just CAN lines?).
Why pay double the price for old, bad Maxim parts when you can get a modern part playing in a different league entirely, for half the price?