48

I'm working on a PHP web application that depends on a few 3rd-party services. These services are well documented and provided by fairly large organisations.

I feel paranoid when working with responses from these API, which leads me to write validation code that validates that the responses match the structure and data types specified in the documentation. This mainly comes from the fact that it's out of my control and if I blindly trust that the data will be correct and it's not (maybe someone changes the json structure by accident), it could lead to unexpected behaviour in my application.

My question is, do you think this is overkill? How does everyone else handle this situation?

12
  • 19
    In strongly typed languages such as Java, this is just the normal way of doing things! Commented May 18, 2020 at 2:00
  • 1
    A similar question was asked on a project I was working on. The context is somewhat different from the situation you describe, but might be interesting nonetheless. Commented May 18, 2020 at 4:16
  • 12
    I've personally been bitten twice by Facebook API not returning what I expected: once was a bug Facebook later admitted and another time Facebook changed the API and both I and my sysadmin missed the announcement. So yes, validate and loudly log errors to tell you what went wrong
    – slebetman
    Commented May 18, 2020 at 5:18
  • 23
    “depends on a few party services” — PHP in the front, party services in the back! Commented May 18, 2020 at 9:36
  • 3
    You might want to wrap the external API and maybe perform the validations on your wrapper
    – jmm
    Commented May 18, 2020 at 14:07

8 Answers 8

68

Absolutely. For starters, you never know that somebody hasn't hacked into your connection and the reply you receive doesn't come from the API at all.

And some time in the last two weeks I think Facebook changed an API without notice, which caused lots of iOS apps to crash. If someone had verified the reply, the API would have failed, but without crashing the app.

(A very nice case I heard why validation is needed: A server provided information about goods a customer could buy. For dresses, they included the U.K. dress size as an integer, usually 36 to 52. Except for one dress, the size was a string “40-42”. Without validation that could easily be a crash. )

7
  • 19
    I'm not sure that hacking is relevant here. The question seems to be about validating the structure and/or constraints of the third party responses, and presumably a hacker capable of a man-in-the-middle attack would be able to provide a valid response (either based on the intercepted response, or just bogus data). Your second point about API changes is more relevant. Commented May 18, 2020 at 15:25
  • 4
    @ChrisBouchard I think the point is not only should you consider the idea the feed is incorrect by accident, but it may even by maliciously incorrect and you should treat it as untrustwortjy Commented May 18, 2020 at 16:22
  • 19
    I think hacking is definitely relevant here. All data originating from outside the trust boundary should be validated, whether it comes from the user interface or from an API interface. Unvalidated inputs creates an attack surface for things like SQL injection, XSS, and insecure direct object reference.
    – John Wu
    Commented May 18, 2020 at 16:39
  • 3
    It's yet one more hassle the hacker has to go through to penetrate the system successfully, and even if they know certain, specific fields the program is using, they may not have access to documentation describing the full correct format. In that case, this will help the system be more secure, and it's relevant to the OP's question. Commented May 18, 2020 at 18:40
  • 1
    @ChrisBouchard -- Hacking is definitely relevant. Any interface to the outside world allows hackers to affect the four C's -- company, customers, clients, and careers. I list cusomers twice because they're so important, and I end with the career it affects -- yours, if this kind of thinking facilitates an exploit. You speculate that a hacker would produce a valid response, but in the book "Writing Secure Code", it becomes very clear that a lot of hacking is about crafting the invalid request or response! Commented May 18, 2020 at 19:50
43

Somebody else's API is your external interface. You shouldn't blindly trust anything that crosses that boundary. Your future debuggers will thank you for not propagating the other system's errors into yours.

17

Is your API-boundary also a trust-boundary?

As you are communicating with a remote system, that's nearly a certainty. Even if the remote system itself might be trusted, the medium might not be.

Failure to successfully and consistently verify all untrusted data may result in a crash in the best case, to silent hostile takeover at the worst.

Is the API stable?

Even a trusted API might not be stable, in which case extra-verification is needed, and a plan for backing out, up to denying service until fixed.

Is the implementation behind the API well-tested, mature and reliable?

It doesn't matter whether the API is stable if the implementation fails to live up to it.

Always remember there is a tradeoff

More tests mean more code which might contain bugs, and will be rarely if ever exercised.

This code must be written, maintained, and debugged, all of which drains effort needed elsewhere too.

Also, comprehensively testing the failure-case is somewhere between hard and impossible without mocking the complete API, likely leaving bug undiscovered, and accumulating more, even if slower than comments.

Thus, some APIs are simply relied on to work, while others are (or at least should be) verified on each call to at least some extent.

1
  • 1
    Services with remote APIs sometimes have a debugging area, to test the API with real calls. The calls may do absolutely nothing and just receive an error or a success. But those are very close to the live system. For example, Paypal has their sandbox system for this. Commented May 20, 2020 at 16:19
3

Paranoid or not depends on how robust your software must be.

I think, if your checks have minimal extra implementation costs then they are ok.

Example:

  • if you communicate with services through XML the structural verification can be done through an XSD schema.
  • in Java/C# you can have guard statements that throw an exception, if the API contract is broken
    • Example: if you get a birthday from an external service then the guard-statement assert(birthday > '1900-01-01' and birthday < '2050-01-01') will throw an exception if birthday have a non plausible value
5
  • 43
    Somebody is going to curse you in 2050. Commented May 18, 2020 at 7:52
  • 7
    And even now, this rule only includes the birthday of the oldest living person by 3 years and one day, according to wikipedia that is: en.wikipedia.org/wiki/List_of_the_verified_oldest_people
    – Hulk
    Commented May 18, 2020 at 10:46
  • 4
    And who says we only need birth dates of living people? Sometime in 2049 some doctor will want to record that the estimated birth date if some baby is Jan. 2050. Actually, it is possible that the birth date of a new born baby is the day after tomorrow.
    – gnasher729
    Commented May 18, 2020 at 12:15
  • 9
    I've found that you should never put fixed upper bounds to dates in your system, purely because you don't know how far in the future your software will still be used in the current version. A retrocomputing enthousiast might be running your 2010s software in 2156 as a curiosity project.
    – Nzall
    Commented May 18, 2020 at 13:14
  • This is the common sense answer. Checking API results is additional work. You do that if it's worth it (not always). Also, many things cannot be checked at all and must fundamentally be trusted.
    – usr
    Commented May 18, 2020 at 18:25
3

Yes, but in most cases that should not be your personal concern.

For most languages there are parsers that parse a native JSON (or whatever your transfer language is) response into your internal objects. They come with all the options to consider different writing style, understand corner cases, escape characters, special character encodings etc. Their validation code is used by thousands of other applications. You should use one of these parsers if possible and rely on their validation methods instead of validating the syntax yourself. I.e. they should throw exceptions, return error codes or otherwise complain if the input isn't matching what you specified (missing fields, strings not matching your defined pattern etc).

The only validation you might want to do yourself in your own code is that the response makes sense for your business logic. Don't try to reimplement the other service though, it does not make sense to fully validate that their response is correct: if you can do that locally then you don't need to call them. (Unless you deal with totally sensitive things/hard problems, then you could call multiple services and combine their results). What you can do when you want to protect against some extreme level of disasters from malicious responses is to detect answers that are way out of bounds. I.e. block a transaction in your bike rental service if the bill calculated externally for a single customer goes beyond 1000 $ or such. But be careful, one easily overlooks corner cases that are valid (e.g. a "virtual" customer that pays for his whole company rentals for a year).

1
  • 3
    Exactly. The third-party APIs being used are an implementation detail, which is hopefully hidden below your own service layer. So your number one priority is to ensure that your own services' pre- and post-conditions hold given the responses from the third party. Beyond that, you can validate, sanity check, etc. for your own uses, just keep the level of paranoia proportional to the value of the service, and inversely proportional to your level of trust in the third party. Commented May 18, 2020 at 15:32
3

Your validation shouldn't be to restrictive. There is the "tolerant reader" pattern. It means that you should be as tolerant as possible, when consuming data from other services. On the other side, there is the "Magnanimous Writer" pattern. Together, they help to produce more robust communication systems.

For example, in a JSON based interface, you probably should allow unknown properties. This allowes the other side to add new properties without breaking your side.

3
  • Seems you are talking about Postel's law. Naturally, there are problems with it. Commented May 19, 2020 at 1:20
  • 1
    I am actually referring to Martin Fowler martinfowler.com/bliki/TolerantReader.html, but yes he references postel's law.
    – user355880
    Commented May 19, 2020 at 7:12
  • 1
    I worked on a mobile app where the iOS developer added code to check the structure of every API response, crashing the app if there was any extra item he didn't know about. Envisage how that worked in the wild when there was an API update. Telling every user they need to get the new version of the app before it'll work at all ... Commented May 19, 2020 at 14:27
3

Absolutely. We have been caught out by this with Microsoft APIs, for example, and we were not even set up to log that in our Azure function application. So all we saw was that requests to our endpoints failed. It changed without any warning between hand testing / UAT and actual live use of our application.

Our unit tests still worked of course, because they used the schema from the Microsoft documentation (which had not been updated). I only knew because some other kind developer commented on the Microsoft documentation!

Make sure to log what you actually get as request from external APIs to your endpoint / as response to your call and throw meaningful errors (as appropriate) in your application.

This actually gives me the willies with our current project, which relies on many external APIs - we have monitoring functions and E2E tests with Cypress for vital functions running every so often, so at least we know when it happens. We are still working on how to reliably know in advance...

2
  • 1
    Careful with just always logging full responses though: that costs I/O bandwidth, can be a legal issue wrt data protection and privacy laws as well as a security issue. But it's always nice if you can flip the logging on when you need it - if necessary with sensitive information anonymized. Commented May 19, 2020 at 11:41
  • yup, those are sensible caveats that need to be taken into consideration, that I really should have mentioned given I am currently doing a pile of security related cleanups on another app!
    – kpollock
    Commented May 20, 2020 at 12:53
-3

Yes it's overkill in most scenarios for writing a web application.

I think of a third party api similar to libraries from a package manager. Do you write tests for each of the libraries you use which aren't built into php ?

Normally apis are versioned, are documented and shouldn't have surprise changes - all like packages.

6
  • 4
    Deployed code that automatically recompile itself with latest builds of all related packages is rather rare... so I don't think you comparison is useful. "Normally apis are versioned, are documented and shouldn't have surprise changes - all like packages." - not really sure where you code... I'd love to move to that world :) Commented May 19, 2020 at 2:31
  • haha - I disagree given most api's I've worked with are versioned and should be dependable as such - therefor negating "automatically recompile itself with latest builds...". Given your different experience I understand your point though.
    – aaaaaa
    Commented May 19, 2020 at 18:35
  • 1
    Perhaps if the API is no longer under active development, but this is frequently not the case. For example, Amazon, Ebay, Royal Mail or DPD all have APIs which are constantly changing and evolving. It's not unusual for those to have surprise changes, outdated documentation, server errors, etc. No disrespect to the people who develop and maintain those APIs; but any platform which is under active development is prone to breaking. Developers are merely human and just as likely to make mistakes on these as a developer working on any other growing/evolving software platform. Commented May 20, 2020 at 17:09
  • couldn't the same be said about accidental semver breakages in packages ?
    – aaaaaa
    Commented May 20, 2020 at 17:29
  • No, that isn't going to happen with any half-decent package manager, since it would only use the precise version of an installed package and will not just automatically choose a newer version. A developer has to be actively and consciously updating to a newer version of the package, which means they know about the fact that the package has changed, and are also putting the whole system through the CI/CD pipeline again, as well as its standard QA process Commented May 25, 2020 at 11:31

Not the answer you're looking for? Browse other questions tagged or ask your own question.