Embrace the cloud, but manage the costs

Artificial Intelligence Machine Learning Natural Language Processing Data Technology

The online sports betting business relied on on-prem data management, which was beginning to creak.

For over two decades bet365 has developed much of its tech stack in-house and hosted it on-premises. It’s an approach that’s given us the control and agility needed to achieve and maintain market leadership. However, that changed in 2023, when we engaged in our first experiments in the public cloud.

After several successful small and large-scale cloud-based projects, my data engineering team at Hillside Technology (bet365’s technology innovation business), set in motion a major initiative to transform data management and provision through its consolidation into a single data lake and warehouse. Eighteen months later, the programme is in full swing and has already made great strides forward.

The data lake and warehouse project reimagines our approach to data both in terms of the technology we use and the way in which it is stored, accessed and consumed. The aim is to create a single source of truth for all of bet365’s untransformed data. That data will then be transformed within the warehouse for reporting and analytics, and ultimately made available to the organisation in real-time.

The challenge

As an online sports betting business that serves millions of customers in real-time, data has always been at the heart of the bet365 business. As we’ve become more data savvy, the importance and need for timely data has increased across the organisation.

The on-prem data management capability served us well for many years. However, as the load increased, it began to creak and slow, making it clear that a replacement was necessary to maintain business as usual and improve the speed and quality of the information we provide our teams and customers.

This led to a comprehensive rethink to our approach from both a technical and process point of view. Prior to the project, our on-prem warehouse pulled data from different platforms around the organisation. As our datasets expanded and the needs of the organisation grew in complexity, we were struggling to scale the technology to meet the volume of data generated.

In addition, there was a lot of duplicated data in the system, which had a negative impact on both its usefulness and the cost of storing and managing it. We were also very aware of one of the essential rules of data management: if you have data in two places, it will be wrong in at least one of those places.

Why the cloud?

We’d conducted several successful small-scale experiments in the cloud to test its reliability and security, but the data lake and warehouse project provided a good opportunity to assess the large-scale potential of cloud-native technologies.

However, while going with the cloud was an option, it wasn’t a certainty. We looked at both on-prem and cloud-based systems. From a list of around 10, we whittled the options down to two for an initial proof-of-concept. As it turned out, both were cloud-based systems.

Ultimately, we chose BigQuery because of the amount of control the system gave us over functionality and cost, the quality of the support we got from Google, and the close fit it had with other technologies we’d selected. We anticipated that the separate management of storage and compute would be a big benefit to us.

At that point, the commitment to the cloud was a no-brainer. We wanted to leverage all the benefits of the cloud’s modern architecture and the applications it makes available to us. That requires everything being hosted in the cloud, not a hybrid set up that draws from both modern and legacy architectures.

We’ve achieved a lot in a short amount of time. We have elements of the data lake set up and the lift and shift of the on-prem warehouse that stores all bet365’s current data took eight months. Now that we’ve done that, the next step is for SQL and Json data to be pushed into the lake in real time using Kafka queues.

We’ve started to remove the technical debt and are continuously improving the warehouse to make the most of what BigQuery has to offer. The brain of the data warehouse is the transformation step, the ‘T’ in ETL, which we’ve achieved through Ab Initio. We’ve already shaved hours off the time it takes to create reports.

Currently, reports are still generated once a day, (much earlier than on the on-premises warehouse), but we expect to see real-time report generation in a few months.

Key considerations

For a company like ours that operates in a highly regulated market, it was critical that we ensure everything is incredibly secure and meets compliance. Infosec is involved in every cloud migration and has final approval to ensure everything we do is secure and compliant.

We’re conscious several companies have moved back from the cloud to on-premises, mostly because of the lack of control over the cost. So, this was a key concern when deciding on a solution.

We also made sure to have a business continuity system in place in another region, so a complete cloud outage will not affect us. In addition, updates are rolled out separately in these regions to mitigate potential software issues.

Next steps

As we learn more about BigQuery’s functionality, we will continue to optimise the system for our use. The next big project for us is to improve on the cataloguing and lineage system for the data, so it’s easy for the user to navigate and for us to investigate any issue with the data.

The next major phase is the migration of a second data warehouse, which has all bet365’s legacy data. We are currently migrating it to BigQuery and combining with the existing cloud warehouse: a massive task. The on-premises warehouse is over two decades old and encompasses 20 years of code modernisation, a complex business logic layer, and myriad connections into diverse systems.

To aid in our work, we are leveraging a new GenAI tool developed by the in-house Platform Innovation team. The tool allows us to visualise the design of the legacy SQL database, including all its procedures and interdependencies.

We are also using GenAI to accelerate the data architecture and design of the new combined data warehouse itself. We are feeding GenAI with the legacy SQL procedures and the existing data warehouse; it does the leg work of coming up with an initial design that our data architects review. This enables a quick feedback loop where AI and experts work in concert.

This process ensures a seamless migration of data from the on-premises system to create a unified, pristine and scalable platform for all bet365’s data.

We’re also running an organisation-wide roadshow to introduce the new system to the business and work with each department to understand their data requirements. Ultimately, we want to make the system as self-service as possible. The question is, what would that look like, and how would it work? These are the critical questions we must now answer.