Demand for new data centres is rapidly growing worldwide, primarily driven by the explosion of data from artificial intelligence (AI), the widespread shift to cloud computing and the increasing push for digitalisation across all industries. At the same time, data centres face multiple threats, ranging from cyberattacks to physical risks, such as natural disasters, power and cooling failures, as well as physical security risks.
In response, companies (in particular tech and data firms) are addressing the risk of service interruption to their IT operations by building redundancy into their network and deploying comprehensive physical security protocols to enhance resilience.
Interconnection of redundant data centres duplicates critical components and ensures systems and data are continually operating and available in the event of a failure.
Cluster events
A data company may have assessed risks at each of their individual data centres, but not at the cluster level to account for interdependency – or ‘shared fate’ – and understand the impact across all locations. For example, widespread events like earthquakes or hurricanes, interruption of power supply or political events in a particular city can affect many locations at the same time. In the UK, data centres are primarily clustered near fibre optic and power infrastructure where around 80% of them are concentrated around the M25 surrounding Greater London.
More frequent drought or heat stress conditions may also cause an increase in operational expenditures to cool equipment during these periods. If then followed by severe flash floods caused by storms, this could lead to physical damage or even business interruption to individual assets or even several assets within a cluster.
Prepare for the worst
To protect data centres from property damage and business interruption, five specific actions based on best practice would help to ensure a more multi-layered and analytical approach to catastrophic natural catastrophe and climate risks.
1. Risk and resilience assessments
A holistic data centre risk and resilience assessment should analyse not just the risks and impacts of natural catastrophes and utility outages, but also other threats like proximity to high-risk locations or geopolitical risk.
It is then possible to rank individual or data centre-cluster vulnerabilities, informing estimations of potential downtime. Any vulnerability assessment will also need to be underpinned by a data centre building’s characteristics, existing redundancies, mitigation measures and utilities dependencies.
With multiple data centre locations, assessing vulnerability scores and downtime estimates can help prioritise risk management efforts. By focusing on locations most vulnerable to downtime, adaptation and risk mitigation resources can be allocated more efficiently. And by having a combined risk score for multiple perils and multiple data centres within a cluster, cumulative impacts can be better understood and how to manage the associated risks more effectively.
2. Withstanding major downtime risks
Conducting a comprehensive risk and resilience assessment enables more effective enhancement of a data centre’s physical infrastructure to better withstand significant natural disasters and climate-related risks.
Depending on the location, threats and likelihood of impact, this may mean using earthquake-resistant building materials, protecting IT equipment with shock-absorbing mounts and seismic-rated racks, installing flood barriers and water detection devices, or building a data centre to withstand high winds. Assessing these risks as early as possible means you can consider climate-resilience during the design stage and before construction. Installing fire suppression systems and maintaining defensible space with fire-resistant landscaping around the facility can also help defend sites.
Structural measures can be supplemented with advanced monitoring systems to detect and respond to threats as they happen. These systems can provide early warnings of potential issues, enabling additional steps to minimise downtime. For example, if a severe storm is approaching, real-time monitoring and alerts could give more time to shut down non-critical systems and ensure backup systems are ready to go.
3. Shared fate risk management
There are many ways data centres can achieve redundancy that ensures if one part of a network fails, another can take over without causing significant downtime. For example, a company’s power sources can be diversified by using a combination of grid power, on-site generators and renewable energy sources like solar or wind, reducing the risk of a single point of failure.
To achieve redundancy and reliability, a organisation can choose to build multiple data centres within a 100km radius near major internet exchange points. If one data centre experiences an outage, the others can take over, ensuring business continuity. While clustering might enhance redundancy, it can also increase the shared fate risk of a single catastrophic event impacting multiple data centres at once.
Redundant cooling systems can help maintain optimal temperatures in a data centre, crucial for performance and longevity. Advanced analytics can help not just with site selection, but to understand the intensity of heat and drought now and in the future and other environmental hazards. These hazards can then be linked to a company’s operational thresholds, allowing informed and efficient decision-making on cooling capacity, again helping to reduce potential downtime.
4. Business continuity and disaster recovery planning
Implementing robust business continuity and disaster recovery plans is crucial for minimising downtime and ensuring data centres can recover from natural disasters quickly. Plans should cover all aspects of a data centre’s operations, from the physical infrastructure to the data and applications. In particular, business continuity planning should outline steps a tech firm needs to take to keep their data centre operational during a disaster, for example, establishing a secondary data centre in a different location which isn’t climate-exposed and that can take over if the primary data centre is affected.
Disaster recovery planning should include establishing a detailed inventory of all equipment and systems, along with a plan for replacing or repairing any damaged components. It should also feature
procedures for restoring data and applications, ensuring a company’s clients can access their information as quickly as possible.
5. Improve inter-location communication
Enhanced inter-location communication can better protect an organisation’s operations from natural catastrophe and climate events, in particular those impacting multiple locations within a cluster.
This is about a company establishing a clear line of communication between all of its data centres and a system for sharing information in real time. For example, it could use a centralised communication platform that all of its data centres can access, or set up a dedicated emergency response team to coordinate communication across multiple locations.
Regular drills and simulations can test your communication and coordination plans between data centres, ensuring they are effective and all teams well-prepared for any eventuality.
To read the latest IPE Real Assets magazine click here.