To solve for aging power infrastructure, data centers are turning to cutting-edge technology like microgrids to supply electricity more reliably. It reduces the impact of power outages and it can restore service faster when any outages happen.
What is power resiliency?
Resiliency is the ability of a server, network or storage system, or an entire data center to recover quickly and it helps to continue operating when any equipment failure, power
outage, or any other disruption. Data center resiliency is a part of the facility’s architecture and it is usually associated with any disaster recovery plan and other data center considerations like data protection. The adjective resilient means the ability to spring back.
The importance of power resiliency in data centers
Data centers rely on power for any malfunctioning hardware or end-of-life replacement may result in thousands or even millions of dollars per hour. So backup power is crucial in data center design. These infrastructure units include uninterruptible power supply (UPS) systems, standby generators, switchboards, and lines. Data center resiliency is achieved through the usage of redundant components, systems, and facilities. When any one element fails or any disruption occurs, the redundant element helps to continue operation to the user base. Business continuity incident response and all emergency responses are factors of an organization’s overall resilience. The goal or aim of resiliency is to minimize downtime. Users of a resilient system didn’t know that a disruption has occurred.
Data Center Resiliency Examples
Below are some ways that power resiliency is incorporated into data centers.
Server redundancy
If a server’s power supply fails, then the server fails also. So all the workloads on that server may unavailable until the server is repaired. The workload should be restarted on the next suitable server. Servers often incorporate a redundant uninterrupted power supply. The backup power supply turns on automatically when the power supply may fail. It may keep the server running until the replacement of the failed power supply. The techniques like server clustering, and support redundant workloads on more than one server. When one server in the cluster may fail, then another node takes redundant workloads.
Data center redundancy
It is also the same as server redundancy. For example, an organization will power its data center with two utility feeds from different utility providers. So the backup provider is available if the first provider may fail.
Colocation
The organization supports hot sites that can be used for data center colocation. Through this approach, data center managers move an entire operation from one facility to another facility concerning local disruption or any regional disaster.
Critical services
The redundancy techniques of the data center can vary with the respective workloads. And that redundancy is a main factor in the resiliency plan. Organizations with mission-critical computing workloads or high availability workloads applications use more resilient techniques in data centers because of the price of not preserving critical computing services.
For example, critical business services like online transaction processes, and database systems may be designed with data center resiliency including clustering, snapshots, and off-site redundancy. Non-essential workloads can tolerate disruption levels and may receive little resiliency.
Resiliency vs. redundancy
The easiest way to differentiate between resiliency and redundancy is to achieve resilience must have redundancy. Redundancy is relatively easy to achieve by adding backup primary data center components that don’t mean the data center is resilient. Data center managers can easily determine if the data center is resilient in one of the following ways:
- They can shut off the power to the data center and look at what happens. The CIOs and their managers didn’t attempt this experiment when the redundancy resources. The risk is very high, especially during daily production.
- They can launch a shutdown on weekends or any holiday if the operations are slower or less critical. The result is the organization’s ability to bounce back from the disruption of IT infrastructure. It also identifies areas where additional resources need to be.
Data center industry vendors, consultants, and researchers provide assessment services to help businesses for a better understanding of resilient data centers.
How they work
To develop a resiliency plan and the data center operations teams evaluate the existing IT infrastructure and decide on the mission-critical element. Then they must determine the level of resilience needs. They should consider the business and technical factors. The cost of resiliency is high because more investment is needed for more resilience.
The diagram above introduces the concept of N+ redundancy. A data center with no redundancy is an N facility and the redundant components are added until the one-to-one level of redundancy. At that time the data center has N+1 redundancy. Some organizations add multiple elements of redundancy that as a second corporate data center, colocated data center, or cloud-based replicated data center. These approaches make the organizations closer to real resiliency or N+X resiliency.
The goal of resiliency is to reduce downtime. The users of a resilient system didn’t know that any disruption has occurred. Business continuity, incident response, and all emergency management contribute to operational resilience.