The disrupted datacenter: Conditions align for a return to direct liquid cooling
Direct liquid cooling (DLC), also known as on-chip cooling, is a method of heat dissipation where the processor or other components are close to or fully immersed in a liquid. Over the last five years, there has been a steady increase in the number of suppliers offering DLC: All of the major server OEMs now have a product. The hope is that growing requirements around big data, Internet of Things and edge computing will spur wider demand. The majority of the take-up of DLC to date has been in high-performance computing (HPC) facilities, but there are signs that a wider range of operators will be persuaded by the improved server performance, lower cooling costs and higher rack densities that DLC can enable. If DLC became more widely adopted, suppliers of conventional datacenter air-cooling equipment would face the most disruption.
Direct liquid cooling is one of more than a dozen technologies that we are evaluating as part of our upcoming Disruptive Technologies in the Datacenter report, a follow on from our widely read and referenced 2013 report.
The 451 Take
DLC predates air-based cooling and is an inherently more efficient way to cool IT equipment. However, compelling thermal dynamics has not been a persuasive enough reason for most datacenter operators to adopt it. The real issue is whether a convincing economic case can be made to counter concerns over maintenance, safety and other areas. Historically, the answer has been 'yes,' but only within the confines of HPC facilities. However, HPC-like applications for big data and machine learning seem set to become more widely deployed in enterprise and hyperscale facilities in the near future. DLC also allows more compute power to be packed into less whitespace in space-constrained urban datacenters, including micro-modular datacenters, but the sector is still nascent.
Technology and context
DLC is not new (it dates back to the mainframe era), but despite its potential benefits and ongoing innovation, adoption remains low compared with air-based cooling. There are a number of approaches to DLC in datacenters. No single technology has dominated thus far, and each has benefits and shortcomings, depending on its application.
- Non-immersive (cold-plate): A number of DLC systems use water as the coolant. The benefit of water is that it is a good conductor of heat, and is inexpensive and plentiful. Cold-plates are also relatively easy to retrofit to conventional servers.
- Immersive: Dielectric fluids, such as NOVEC, developed by 3M, and mineral oils have several advantages over water – the chief being that components can be directly immersed in these liquids, which improves thermal efficiencies and often results in simplified DLC systems with no requirement for additional air-based cooling.
Other approaches to DLC that haven't been widely adopted include use of liquid Freon, liquid nitrogen and liquid helium as coolants. Use of these substances in DLC remains rare, and their cost and volatile nature mean that they are unlikely to be used at scale in datacenters. IBM and others have also developed processors equipped with microfluidic channels used to carry coolants, and possibly even power, directly inside chips, but these remain largely experimental.
However, removing heat from IT components and capturing it in a liquid is only half of the DLC process; a variety of technologies are also used to eject that captured heat from the datacenter. For example, suppliers have shown the potential of warm-water cooling, where the inlet water to the DLC system does not require any mechanical cooling. In these examples, lower-cost, chiller-free technologies such as dry coolers and heat exchangers (liquid-to-liquid or liquid-to-air) are used to lower the temperature of the liquid before it re-enters the IT system.
DLC at the edge
DLC has several benefits for micro-modular datacenters. These compact, self-contained datacenters are designed to be rugged and flexible enough to be deployed outside the conventional datacenter whitespace. The requirement for these systems is expected to grow in the mid to long-term, driven in part by an increase in IoT-related data. Using DLC could reduce server failure rates caused by air-based pollutants or related to fans generally. The elimination of fans also means micro-modular datacenters would be significantly quieter, which is important if they are to be deployed in or close to office environments.
A recent trend has also seen some suppliers integrating DLC with Open Compute Project (OCP) server and rack technologies. (OCP and other open datacenter architectures will be examined in depth in a forthcoming Disruptive report). The benefits of combining DLC and OCP include overall cost reduction: OCP is designed to eliminate systems costs compared with systems from conventional OEMs. DLC adds some direct system costs, but reduces the need for expensive mechanical cooling. However, these savings may only be realized when DLC-equipped OCP systems are deployed at scale.
Drivers for adoption
There are a number of reasons why DLC may (eventually) fulfill its promise as a disruptive datacenter technology:
- Improved server performance: The cooling capacity provided by DLC means systems can be pushed to higher levels of performance and improved utilization. For example, Dell partnered with Intel and eBay to develop a specialized 200W CPU (Intel Xeon processor E5 v4) that could make use of the additional cooling provided by eBay's Triton DLC system. The additional cooling capacity provided by the Triton DLC system meant that the custom processor was able to effectively remain in 'turbo mode' for long periods. As a result, search inquiry throughput was increased by 60% compared with an equivalent standard Intel processor.
- Increased availability of DLC servers: Customers now have a wider choice of DLC-equipped servers, including from all of the major server OEMs.
- Increased rack power densities. While average rack densities remain low at below 5kW per rack (according to research from Uptime Institute and others), there are signs that the long-forecast uptick in rack densities may be materializing in some sectors. The need to maximize compute in space-constrained urban datacenters is one driver. Developments in processor design, and demand for compute-intensive applications and services, will also play a part. Some hyperscale operators, such as China's Baidu, are investing heavily in FPGA- and GPU-based systems in order to develop machine-learning and other services. The company believes DLC is the only way to efficiently cool these high-density systems.
- TCO improvements: Air-based cooling can account for up to 40% of the capital costs and more than 30% of the operating costs of a traditional facility. DLC (using warm water) can help reduce the total lifecycle costs of the datacenter significantly by eliminating the need for mechanical cooling. Some forms of DLC can reduce cooling-related power costs by 90% or more and enable power usage effectiveness (PUE) ratings of less than 1.02.
- Heat reuse: Several DLC systems suppliers have demonstrated the ability to re-use high-temperature water (either directly or via heat exchangers) from servers in order to heat adjacent buildings. Regulators, especially in Europe, want to encourage datacenters to re-use more of their waste heat energy.
- Maintaining uptime despite cooling failure: DLC can keep IT hardware within thermal limits even if the DLC system (or ancillary cooling systems) fails. Ensuring there is an adequate thermal buffer is one of the reasons that air temperatures are so low in traditional datacenters, which greatly contributes to their high cooling costs (operational and capital).
Impediments to adoption
Despite the efficiency and cost benefits of moving to DLC, a number of significant barriers to more widespread adoption could persist:
- Cost and complexity: DLC-based systems are more expensive from a capex perspective (up to three times in some cases) than standard air-cooled systems. The additional plumbing required is also a concern to some operators.
- Familiarity: Operators do not want to risk moving to an unknown technology despite its efficiencies.
- Health and safety: There is an understandable reluctance on the part of operators to introduce liquids of any kind into close proximity to electronic and electrical equipment.
- Maintenance and training: Removing air-cooled servers from a rack is relatively straightforward, but disconnecting a liquid-cooled system is often more complicated and requires specialist training. DLC systems are harder to maintain and may require greater use of external experts for relatively low-level tasks.
TCO remains an issue for widespread adoption of DLC. There are significant savings in terms of capital and operating costs associated with mechanical cooling equipment. But DLC servers are more expensive on average (although combining DLC and OCP could drive these down), and there are additional costs around maintenance and training.
The 451 Group's research into the disruptive impact of direct liquid cooling is ongoing, and our assessment will be published in early 2017. We welcome informed input on this and the other disruptive datacenter technologies we are evaluating, including silicon photonics, workload-optimized datacenters, clean power, autonomous datacenters, datacenter service optimization software, distributed UPSs, direct liquid cooling, dexterous robots, cloud-level resiliency, and transactive energy and microgrids, among others. Please email firstname.lastname@example.org to participate.