Should you create or copy your test data?

4 min readNov 4, 2022

Software testing is a complex and varied field where different testing types depend on different requirements, environments, audience, etc. but the need to use test data can be one constant factor across any testing type.

Users have two options for obtaining test data; creating new data, or using sanitised production data. Both methods offer their advantages and disadvantages within different test stages and scenarios.

Let’s dig a bit deeper to identify the best approach for obtaining test data.

What is meant by ‘creating test data’?

Creating test data is the process of generating a data set tailored for the specific test case or scenario.

The testers themselves make this data according to the needs of the test.

Its complexity and variety will change depending on the exact requirements.

Advantages of creating test data:

+ Can facilitate data sets that exactly match the testing requirements.
+ It's a relatively straightforward process to create test data regardless of the complexity of the requirements.
+ Can be easily modified and extended when needed.
+ There are little to no security or compliance risks as this data does not contain any sensitive information (eg PII).
+ No need to implement a data sanitisation process before using this data for testing.
+ Test data created in-house reduces test prep time.

What is meant by ‘copying test data’?

This approach uses existing data as the core data set for tests.

Typically the data comes from a production environment and can risk breaching privacy legislation if it doesn’t hide or mask Personally Identifiable Information (PII).

PII data, such as personal details, addresses, dates of birth, etc., must be anonymised. This process is referred to as sanitisation.

Advantages of copying test data:

+ Real-world production data that can help to simulate exact production use cases.
+ Can help with identifying bugs that may only occur in production
+ Reusable across multiple test cases.
+ Invaluable in solution optimisation and stress testing.

Creating data vs. copying data?

Now we have a basic idea about each method for obtaining test data, let’s compare them to figure out the best approach.

This comes down to two primary factors: how the data will be used and the ease of obtaining data.

1. How will the test data be used?

Creating test data is the more straightforward approach and is a valuable option if you need to test a new feature or product. New data can be created that caters to the specific test case.

This gives more flexibility to the testers as they can produce data that is tailored specifically to the requirement.

It is unlikely that any production data will match the requirements for testing a new feature or product. Production data is better suited to test cases involving improvements or extensions to existing functionality.

Production data is the way to go when the tester wants to have the best chance of confirming the behaviour in production or in the retest of a production bug.

On the other hand, copying existing production data may be restrictive as the data is not tailored specifically to the test cases.

In this situation, where the test data is not a good match, it must be modified before it can be used in these test cases.

2. How easy is to obtain test data?

Again, creating test data from scratch is relatively easy and is the more straightforward method, especially if there are no constraints on the data needed.

Copying data is a more involved process.

It requires users to implement a proper method to copy data safely and efficiently from a production environment to a test environment. Furthermore, they need to sanitise this data so that no user or system-identifying information is exposed, even to internal users.

The complexity and length of time to copy the data can be mitigated through automation of the copy/sanitisation process. For example, an organisation can implement a replica of a production database in a test environment that will automatically copy and sanitise data at a predetermined schedule.

This allows users to have an up-to-date test dataset aligned with their production data.

Automation significantly reduces the time to obtain (copy) data. As the data is readily available, testers can directly modify the test data to suit their specific needs or use it as is.

This process will significantly reduce the workload of the testers in the long term, even if it requires a higher initial investment to implement.

It can even reduce or eliminate the need to create test data in most use cases as the data becomes directly available in the test environment.

Conclusion.

Both approaches bring tangible benefits to the testing process and are invaluable in different use cases.

So, the best strategy is not selecting one over the other but using a combination of both approaches — this will lead to significant benefits in the delivery pipeline.

You can create new test data for a new feature while using copied test data for production bug fixes, improvements to existing features, etc.

This enables users to combine the best aspects of both approaches, such as the flexibility of creating data and the ability to simulate real-world conditions using copied production data.

The choice of test data selected must be re-evaluated in every phase of a multi-phase product development cycle.

Reach out to the luvo Testing team via quality@luvo.com.au for any support!