As the data and analytics space continues to mature and the role of data in business continues to accelerate, many businesses now require a ‘safe space’ to connect business data with the wider world.
‘Data sandboxes’, are nothing new, however, they are currently back in vogue, as data scientists look for ways to explore and innovate with data in a way that is privacy compliant.
With data now an increasingly important (and competitive) area for businesses, data sandboxes offer an ideal solution – increase the agility and effectiveness of data, while simultaneously reducing the risk associated with handling this data.
The theory behind a data sandbox is relatively simple. Much like how a real world sandbox is designed to prevent sand from being mixed with neighbouring dirt, a data sandbox is designed to contain data in an isolated environment. Also like a real world sandbox, a data sandbox allows data analysts to ‘play’ and experiment with their data.
Recent changes in the industry have created somewhat of a perfect storm for data sandboxes.
Businesses now know there is an inherent value to their company data, and they also know that this value can increase exponentially when this data is shared with external sources.
However, connecting internal data with the outside world is not so simple. With the regulatory environment around data sharing rapidly developing, businesses should be sure to adhere to privacy laws or else risk significant fines.
For some, this might mean the risk of connecting internal data with third-party data might outweigh the potential reward.
A data sandbox, however, almost exclusively does not contain personal information and is armed with ways to prohibit reidentification. This helps businesses to mitigate this risk and can enable a level of experimentation and innovation when it comes to handling data.
It is for this reason that data sandboxes are often used to facilitate ‘hackathons’ – in which software engineers and data scientists access specific data sets and prototype analytics solutions – as a way to create a controlled environment.
In terms of how using a data sandbox could help your business, it is important to first assess the data that is available and outline what your goals are. Data sandboxes can be a great starting point for major data & analytics projects. They can provide a ‘proof-of-concept’ environment and minimise the risk of a major – and expensive – error later down the track.
As mentioned before, data sandboxes enable more agile data use, while also reducing the risk that comes with it. With this in mind, businesses should consider just how ‘risky’ handling this data might be, while also weighing up how much room there is to innovate and develop powerful insights with it. In some cases real customer data isn’t needed. The Australian Payments Council, for example, has been able to facilitate hackathons which aim to improve the daily lives of Australians. With a data sandbox provided by the Open Bank Project, participants were given access to APIs which simulate data that covers accounts, ATMs, transactions, transaction metadata and customer meetings. This meant the data was useful, relevant and secure, but was not real customer data.
Once the data sandbox has been established, the goal becomes maximising results. Like anything data-related, having clean and organised data is paramount to ensure that the data can be used and visualised effectively, even in the safe sandbox environment. This might require the use of a third-party vendor.
Although data is all about specific values and numbers, trial and error does play a role, particularly when it comes to building things like machine learning models.
At smrtr, we know that one of the best ways to innovate with data is to connect internal business data with third-party data. That’s why we offer our partners the opportunity to combine their data with our expansive data universe.
By Boris Guennewig, Co Founder & CTO at smrtr