14

Is it common to use real data of the customer to perform testing? What policies companies apply regarding using real data of the customer for testing purposes? Is there any legislations regarding such issues?

2
  • Related: sqa.stackexchange.com/questions/5737/…
    – dvniel
    Commented May 20, 2019 at 9:18
  • 1
    Every single answer so far has misread "data of the customer" to mean "customer data". There is a huge difference. Maybe OP really means "customer data", but "data of the customer" could for example mean that OP is writing software for a customer that has a lot of data about their manufacturing plant's temperature variations, seasonal forest growth, etc.
    – pipe
    Commented May 22, 2019 at 9:34

9 Answers 9

16

Depends on your definition of testing, anonymized data is widely used by Microsoft and others for monitoring and testing in production, it's the basis for A/B testing or monitoring for example.

In Europe the GDPR does not allow usage of private data, but the GDPR does not apply to anonymised information and anonymised data can be used without consent. Anonymised data is defined as “data rendered anonymous in such a way that the data subject is not or no longer identifiable.”

Be careful though, you need to be careful on how data is anonymised and make sure it is really irreversible.

10
  • Why wouldn't the GDPR allow production customer data to be used for testing? They gave consent to use it in production. I can't think of a reason you can't use it for testing, as long you have appropriate measures in place to protect their privacy in the testing environment. What those measures are depends on the context and risks. A bank might take more measures like truly anonymising. datalumen.eu/… While testers of a smaller Content Management System might not anyonmise the email account names of its users. Commented May 21, 2019 at 8:25
  • "In Europe the GDPR does not allow usage of private data" it does with right the consent. So technically you could also ask a sample (or all) of your users to consent to use their data for testing. Commented May 21, 2019 at 8:26
  • Good point, I guess that an explicit consent to use the data in testing should work, but it will burden the company and testers with paperwork, restrictions and procedures to keep the data private and secure. From the top of my mind I would guess that access to this data will be on a need-to-use only, it will have to be as secure as the production environment even to internal accesses and you will need to remove it if the client asks you to.
    – Rsf
    Commented May 21, 2019 at 8:49
  • 2
    Yes, but testing might be a valid need-to-use case, for example for reproduction of defects. Companies probably have a "legitimate interest" to use data for testing purposes, you might have to document a legitimate interest assessment. Also I would have measures like clear and documented (shorter) retentions limits for testing data based on real-data. Commented May 21, 2019 at 8:55
  • 1
    @NielsvanReijmersdal I think that you are wrong. GDPR requires privacy by default. I don't think that having a "legitimate interest" means nothing to GDPR. If you search Test Data Management + GDPR you find tons of articles, none suggests that simply adding a checkbox when filling the privacy forms would fix the issue as such I believe that your reasoning is quite naive in this regard. IMHO, from the point of view of an user I don't think that pseunonymization poses an "undue burden" to the company, it's a quite cheap price to pay for a small amount of privacy.
    – Bakuriu
    Commented May 22, 2019 at 19:42
10

I wouldn't say it is common to use real data in testing, although the customer might provide a subset of "real" data in order to facilitate the process.

Apart from the privacy and business issues, there are also the legal ones, e.g. General Data Protection Regulation (GDPR) has been enforced since 25 May 2018 in Europe (but I think every company dealing with another company from the EU should take notice).

GDPR takes a very strict approach (fines) when dealing with personal data (and personal data is practically any data concerning a person - so this is a very broad definition) so it is better to just use test data, at least in this context.

2
  • 4
    It's more common in Financial Tech as, everywhere I've been, have to use real customer data to test and reproduce customer outcomes. For example, if Mr. Jones received the wrong interest rate in Production, then we'll use his details in a Test environment to see if it triggers the same (incorrect) interest rate. You'll probably find in the Terms and Conditions that you're agreeing to the company using your details in this way :)
    – dvniel
    Commented May 20, 2019 at 8:36
  • 1
    I see that it is hard to fake that kind of data. Also, if you are working with multiple institutions (let us say a credit card company, an internet banking company, and a bank) I guess it would be really hard to agree on a data examples of decent quality .
    – Mate Mrše
    Commented May 20, 2019 at 8:41
8

It depends

In some industries it's not feasible to test without customer data. Sometimes it's not possible to properly anonymize said data - I test software that uses the US social security number for a large number of lookups. That means that any method of anonymization must ensure that the social security number of a given person must produce the same result across multiple tables while retaining the standard social security number format and not violating US regulations on what constitutes a valid social security number. That's just one example of a case where not using real data is somewhere between difficult and infeasible.

Another example - from the same software (which is also legacy software dating from 2002) - is a situation where data syncs between multiple systems. If your application is not the system of record (that is, the master system), you may not be able to anonymize data without losing the ability to sync to the master system. The software I test is a web application that sends updated data to a mainframe and receives updated information from the mainframe. If the data used by the web application differs too much from the data on the mainframe, the mainframe can't work with the data sent by the web application, and the entire sync breaks down.

Another situation where customer data may be necessary is where the customer's configuration is sufficiently unusual/unique enough that it's not feasible to mimic that configuration in order to reproduce a reported bug. In a previous job I had this happen, where there were four separate bugs that produced the same outcome as the customer's actual bug, and I found all of them before I was able to use the customer's data to reproduce the actual bug.

Scalability issues can be challenging without a database the size of a customer one - if the test database is of a modest size while the customer's data set is over a terabyte, it may be impossible to reproduce that customer's problems with the test database. I've seen this happen, and had to sign the non-disclosure agreements that went with the customer mailing us a copy of their database to work with. In those cases, we agreed to user their data only for as long as we needed it to reproduce and fix the problem they were having. That was acceptable with the PCI compliance rules at the time - I don't know if it would be acceptable now since I'm not currently working with software that requires that standard.

That said, email addresses are always changed, usually to one of our internal email addresses so we can test the emails that should be sent to customers, and purge the database of real users whenever we take a copy of the production database to use for testing.

2
  • feasibility has nothing to do with legality, it's your problem if you can't test something without real data, not the customers. In Europe under the GDPR it's common to anonymize data and not just email addresses see this
    – Rsf
    Commented May 23, 2019 at 8:23
  • @Rsf - No argument from me - unfortunately I didn't build the system in question - which is not in any way GDPR-compliant and probably can't be made that way.
    – Kate Paulk
    Commented May 23, 2019 at 11:09
4

Principles related to the processing of personal data

Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes.

So as per data protection law applied, production data can’t be processed for ulterior purposes from when it was originally obtained, without explicit permission from the data subject, an unrealistic scenario.

3

I would not recommend using real customer data for testing. On argument was already mentioned: unlawful use of customer data.

Another problem which can arise: leaking confidential business information. Imagine this scenario: due to a bug or misconfiguration, your test system suddenly sends out real email. Real persons now get a newsletter for a new product your company is developing with critical details.

Better be safe, use dedicated test data with adresses, emails and mobile phone numbers reserved for your testing only.

2
  • 1
    You'll probably find a clause buried in the Terms and Conditions somewhere which gives the company your permission to use your personal data for testing purposes - if you're giving them your permission, then it's not unlawful.
    – dvniel
    Commented May 20, 2019 at 12:24
  • See @Nitin Rastogi's answer
    – dvniel
    Commented May 20, 2019 at 12:24
3

There are situations that are almost impossible to fully test without live data of some kind. However, we may not need to use customer data. I have tested with my live credit and debit cards when I needed to do an "End to End" test of a system. There are bugs that show up only in such a test. A small amount used in a test can be written off based on how much the customer is paying for quality software (but talk to accounting first).

1
  • This is where having and knowing test credit/debit cards and card numbers can be helpful. Also where you can connect to the provider's test service rather than use the real one for testing. There are still times where their test service will have bugs their live service doesn't, so it will depend on your situation.
    – Kate Paulk
    Commented May 22, 2019 at 19:36
3

Others have mentioned the GDPR, but there are also industry specific rules and constraints.

If you’re dealing with payment data, such as credit cards and cardholder data, the PCI DSS has an explicit prohibition on using actual customer data for testing.

Also, the US government has developed draft guidelines on this exact topic: DPIAC Privacy Recommendations on the Use of Live Data 4 in Research, Testing, or Training (pdf file). They are not official yet, but you can consider them a good starting point.

2

As others have highlighted its definitely a very grey area!

Most companies I've worked for do use real customer data (with or without the customers knowledge!). Mainly because sometimes with complex systems, certain issues are unable to be reproduced with test data. On numerous occasions i have found issues we are unable to reproduce without using real data.

We should always try to at least obfuscate/anonymise the data as best as we can

1

Is it common to use real data of the customer to perform testing?

Yes, in smaller engineering teams this is very common. They just blindly copy production data into a testing environment. In larger corporates hopefully they have pragmatic internal policies to guide this. Certainly if the data is of sensitive personal nature.

What policies companies apply regarding using real data of the customer for testing purposes?

  • A thorough ISO 27001/2 implementation would probably cover how data is available to testers. This article shows depth on the subject, resulting in two conclusions:
    1. Development, testing, and change management require clear written information security policies.
    2. The organization must enforce the policies in all projects and have evidence.

Is there any legislations regarding such issues?

  • General Data Protection Regulation: https://gdpr-info.eu/
  • The medical and financial industries also have a lot of acts and regulations that might influence this. I think it is much to sum up as each industry has its own standards.

Data could also be owned by someone else than the owner of the software. You might need formal sign off to use real data in a test environment from a client for example.

I would suggest you get legal counsel from someone who understand the domain you are building and testing software in. Together research risks, contracts and the law.

Not the answer you're looking for? Browse other questions tagged or ask your own question.