The disrupted datacenter: Datacenter as a machine
As datacenters become more complex, with tighter software-controlled integration between components, they will increasingly be viewed as complex machines rather than real estate. A growing proportion of next-generation datacenters will be designed to have no full-time on-site staff, and instead will be monitored and maintained externally. Advanced management software, and eventually dexterous robots, will take on most on-site management tasks. Early examples of 'datacenter as a machine' – as we are describing this new unit of IT capacity – are already emerging, and more advanced versions will follow. DC-as-a-machine sites will be data-driven, run hotter, eschew inefficient perimeter cooling (for close-coupled) and even use lowered oxygen levels to reduce fire risks. Initial deployments are likely to be in space-constrained urban datacenters or difficult-to-access industrial locations, but design and operational practices will eventually be more widely adopted. DC as a machine is one of more than a dozen technologies that we are evaluating as part of our upcoming Disruptive Technologies in the Datacenter report, a follow-up of our widely read and referenced 2013 report.
The 451 Take
DC as a machine is closely aligned with a number of the other technologies examined in our upcoming disruptive report. It will make use of remote monitoring and control enabled by cloud-based datacenter management software. Distributed resiliency will reduce the need for generators and UPSs that require regular manual maintenance and testing. Silicon photonics will enable redistribution of storage, memory and compute analogous to the datacenter being one giant server. Freed from the need to keep on-site staff comfortable, operators will be able to push the limits of chiller-free cooling, including direct liquid cooling, enabling very high densities and reducing maintenance space. However, DC as a machine won't be suitable for all locations and workloads. Risk-averse operators and higher capital costs (at least in the short term) are likely to limit take-up to specific use cases.
Technology and context
The idea that the need for on-site datacenter staff could be reduced or, if possible, eliminated altogether is well established. The development of 'lights out' – or 'dark' – facilities has been discussed for more than three decades. Definitions differ, but sites are typically considered to be dark if they have minimal dedicated support staff (usually IT, but also facilities) and have some form of remote network and operations management in place (e.g., via a remote network operations center). For example, some of the sites built and operated by multi-tenant datacenter companies, such as EdgeConneX or Etix Everywhere, are built out in second-tier locations where qualified staff may not be available, and streamlined or lights-out operation can help to reduce operating costs.
Features of lights-out datacenters include:
- Highly standardized (but not necessarily prefabricated modular) designs to reduce deployment times and build costs.
- Datacenter management software (including datacenter infrastructure management) built in as standard to enable remote monitoring and management.
- Innovative and automated physical security equipment and software to reduce the need for on-site security staff.
New classes of datacenter will emerge over the next 5-10 years that take the lights-out approach to the next level. These sites could be described as 'unmanned by design,' but we prefer the term 'DC as a machine' to reflect the idea that these datacenters will increasingly be viewed as discrete (albeit networked) units of IT capacity (and to differentiate from the current lights-out datacenters). This view aligns more closely with the traditional description by IT practitioners, who usually define datacenters in terms of servers, networking and storage rather than the buildings in which they are housed.
The specific design – and operation – of DC as a machine will vary depending on some of the same factors that govern traditional datacenter builds. But it is possible to outline a number of key characteristics:
- Integrated and agile. One way to think of how DC as a machine will differ from a traditional site is to compare a conventional aircraft to a drone. Replacing the pilot with a remote operator enables the agility, risk profile (danger to pilot), compactness and efficiency to be drastically altered. (There is some loss of real-time, in situ reactivity, but over time AI will help to compensate.)
- Space-efficient. Form factors would vary, but would approximate, for example, the interior of a nuclear submarine, where every inch of space and physical component must be justified in terms of function and purpose. Entry to human operators would be limited to specific access hatches, and in some cases may be little more than (robot-friendly) crawl spaces. Micro-modular datacenters (MMDCs) – small, discrete units of self-contained whitespace – could also be considered as examples of DC as a machine, but we believe the highly integrated design approach used in MMDCs could be scaled up to much larger sites.
- Urban and industrial. The compact footprint achievable from eliminating access aisles, use of significantly taller equipment racks (60U-plus) and highly efficient close-coupled cooling would make DC as a machine more likely to be deployed initially in space-constrained urban areas. However, they may also be deployed in remote or difficult/dangerous-to-access locations, such as industrial sites (e.g., oil rigs). Multi-story sites would also be common.
Specific features and technologies could include some of the following (some of which will also feature in the upcoming disruptive report):
- Advanced datacenter management software, including datacenter management as a service (DMaaS), for features such as predictive maintenance. The majority of IT and facilities-side management tasks would be automated and self-regulating, based on the use of AI/machine learning.
- Distributed resiliency would reduce and isolate the impact of specific IT or facilities-equipment failures by spreading workloads across multiple sites using networks, data replication and traffic switching. Use of generators, UPSs, etc. (which require regular testing and maintenance) would be reduced.
- Higher operating temperatures – into allowable ranges defined by the American Society of Heating and Refrigeration Engineers, and probably higher as machines become more resilient to heat.
- Hypoxic fire suppression (reduced oxygen levels to decrease the likelihood of combustion).
- Prefabricated modular (PFM) components to speed deployment times and improve equipment integration (reducing maintenance requirements).
- Dexterous robots. IBM and others have already experimented with the use of robots for datacenter environmental monitoring. Some data storage sites (tape libraries) also make use of robot arms. Datacenter as a machine would eventually rely on dexterous robots for basic adds, moves and changes, and some break/fix tasks. (Robots are expected to become much smaller, lighter and nimbler in the next decade.) Virtual reality could also enable remote operators to guide robots or inspect inaccessible parts of the site.
Drivers for adoption
We see a number of drivers for DC as a machine:
- Efficiency. DC-as-a-machine sites are likely to have higher capital costs, at least for early adopters, but total cost of ownership will likely be lower than a comparative traditional site. Maximizing compute per square foot – and IT utilization – would drive up efficiencies (particularly for urban datacenters with high real-estate costs). Compact designs would also reduce networking and cabling costs, as well as power-distribution losses. Use of close-coupled cooling, including direct liquid immersion, would largely contain cooling to IT hardware; equipment would be kept cool, but the ambient air temperature in the facility would not be regulated for human operators.
- Availability. According to Uptime Institute (an independent division of The 451 Group), one to two qualified operators are required on-site at all times to support operations at Tier III or IV certified datacenters. Conversely, Uptime's research shows that a significant proportion of facilities downtime can be attributed to human error. Limiting staff access to periodic visits from highly trained professionals and automating more maintenance tasks could result in less human-error-related downtime in DC-as-a-machine sites.
- Health and safety. Limiting site access to highly trained professionals could also help reduce safety incidents. For example, in the US there are between one and two arc flash fatalities – a phenomenon when electricity arcs through the air – every day (figures for datacenters are not broken out). There would also be savings on insurance costs.
- Security. Reducing the number of staff with access to a site should also help to improve physical security. (Remote monitoring also introduces security risks, but these are now more widely understood and are being protected against.)
Impediments to adoption
There are a number of obstacles to DC as a machine being widely adopted.
- Risk. Ensuring high levels of availability often requires the datacenter industry to be inherently conservative. Embracing DC as a machine would carry risks and costs for early adopters that only a small group of progressive operators – potentially hyperscales – would be prepared to accept.
- Immaturity of technology. DC as a machine would require the adoption and integration of a number of emerging and early-stage disruptive technologies. Dexterous robot technology is still extremely nascent, and will likely continue to be more expensive than using humans in the short to medium term. Server equipment, cabling and other infrastructure would need to be modified to enable it to be manipulated by robots, which could increase costs until the technology was produced at a scale.
- Legislation and regulation. There may be specific rules and regulations in some geographic locations that require on-site staff to maintain power and cooling equipment for health and safety or other reasons. Most mechanical and electrical equipment suppliers also require periodic on-site inspections to maintain warranties.
- Loss of tax breaks/incentives. Datacenter operators are often induced to locate new sites in a particular county or state due to tax breaks given in part due to an expectation of local employment. However, DC-as-a-machine sites (using PFM) would require few local workers for construction (largely done off-site) and minimal (specialized) staff for operations.
The 451 Group's research into the disruptive impact of DC as a machine is ongoing, and our assessment will be published in mid-2017. We welcome informed input on this and other disruptive datacenter technologies we are evaluating, including silicon photonics, direct liquid cooling, chiller-free, distributed resiliency and microgrids, among others. Please email firstname.lastname@example.org to participate.