Energy usage within the data center has been a growing point of interest for the entire Internet infrastructure sector for many years now. Keeping all that equipment cool while keeping costs in line is one of the biggest challenges the industry has to face and it’s only getting bigger. With us today to talk about energy management is Dr. Cliff Federspiel, President and CTO of Vigilent. After starting out at Johnson Controls and an academic research stint at UC Berkeley, Cliff took his experiences in dynamic cooling and artificial intelligence and founded Vigilent in 2004.
TR: What services does Vigilent offer to data centers and other mission-critical facilities?
CF: We provide a system that gets data from really low power wireless mesh network sensors and then combines that with smart software that allows our customers to manage their cooling assets. It saves them a lot of energy, which is a quick, black-and-white return on investment, but it also helps them manage their risk and reliability better and also plan their capacity.
TR: How do you approach energy management for the complex, dynamic environments we see today in the data center?
CF: We use low power wireless sensors that measure the temperature on racks near servers, switches, routers, storage, whatever is in them. In most cases we also have wireless controllers. The wireless network is bidirectional and we can control the air flows dynamically and in real-time. In some cases there is already a control network in place so we can piggyback on that and not install the wireless controls. Then there is the software that takes all the data from the sensors and brings it all together. It has smarts that allow it to predict what will happen when it makes a change to any one of the cooling assets in the room -- turns one off, changes the speed of a fan, etc. And we've built into it a machine learning algorithm, so it's always updating those expectations and predictions to make good control decisions.
TR: On-net sensors connected to AI-driven decision making sounds a bit like the Internet of Things, but for the infrastructure of the actual Internet of Things.
CF: That's not a bad description and you are the third person to say that independently in as many weeks. We put in lots of network telemetry, and anybody that has a need to see it can get to it from anywhere – if they have the right access credentials. And it's being used and combined with analytics that help people gain visibility and do better things with their infrastructure.
TR: One of the biggest worries for the Internet of Things is security. Do you have to worry about this as well?
CF: We do focus on security, because it's a big deal for these facilities. Our system sits behind a firewall and is well protected from the general Internet.
TR: When you first launched Vigilent, were you expecting the data center and mission-critical facility space to be the target?
CF: The first problem we solved was actually not for mission-critical facilities, it was energy management in buildings that were older and built at a time when energy codes either didn't exist or were relaxed. They were super energy-inefficient, but it's very hard to retrofit buildings to what codes require today without ripping out all the mechanical equipment and starting over. But we soon realized that wasn't really as attractive a vertical for a growing business, so we pivoted to mission critical in part because the utilities in California got interested in data centers. There were some state funds available, and we reworked our software so it met the requirements for mission-critical facilities.
TR: What kind of facilities do you generally work with?
CF: Today we generally address facilities that are 5,000 square feet or bigger. It's a wide range, from big data centers of over 100,000 square feet all the way down to telecom central offices, mobile switching centers, etc. For example, we did a project with the State of California and put our system in 9 of their data centers. A couple of them are less than 1000 square feet while one is California’s largest at 43,000 square feet. The savings are mostly independent of the size of the facility, but making the ROI attractive for little server closets is challenging.
TR: How much autonomy does a system like this have?
CF: Nobody needs to look at it, it is an automated system that can run completely standalone and unattended. And for telecom customers with a big portfolio of facilities, a lot of them do run unmanned. In bigger facilities that are manned 24/7 it is more common to have our user interface up on a 54 inch screen to use as their command & control view.
TR: How much customization does it take to implement this for a particular data center?
CF: There's no customization or field programming required. Most other automation companies do a lot of field programming, but the problem with that is that you're using a live production site to test your field program and then you have to go back in for more modification whenever anything changes We spend a lot of time getting the software right before it is ever installed, and don't do any software development in the field. The configuration is simply determining where the sensors go, how many of them there are, and setting IP addresses. It allows us to have a light touch, and cause no interruptions.. The machine learning ensures that the energy savings are persistent. It doesn't need to know physically where things are or the geometry of the room.
TR: Do you prefer existing facilities or green field build-outs?
CF: The biggest market for us is existing buildings, because there are just so many of them already. Our product is really good at addressing those because conventional wired ways of getting telemetry in a facility and getting it all set up and configured would take 9 months or a year to do what we do in days. You'd have to be slow and cautious, because it would basically be a construction project using conventional ways. That said, we do have relationships that we've built up over the years with organizations that build new data centers. For example, NTT Facilities is a distribution partner and resells our product all over Japan and Southeast Asia, and has started looking into incorporating our product into their new construction efforts. But we perceive that as a smaller market just because there is less of it happening then the huge installed base that already exists, and the process is slower.
TR: Where do you see the most activity coming from going forward?
CF: There is a massive push to the edge of everyone's network, both by the telcos and the content providers. It's the thing that is really giving the operations folks the biggest challenge for capacity planning and expansion: the need to move equipment closer to customers to reduce latency. And it's going to happen all over again when wireless providers move to 5G. The big challenge on the cooling side is that this forces them to put new, power hungry equipment into facilities that were built a long time ago. Historically middle-tier-importance facilities are now becoming very important. Getting telemetry and analytics in there to figure out what they have and whether it can support what they need to do before they make huge Capex expenditures is very important to them.
TR: Do you foresee applying the same approach to other problems outside the data center space?
CF: In the short to medium term we have plenty to do in mission-critical cooling. We're just getting started in Europe and there are big geographical areas that have a powerful need for this product, and I expect that will keep us busy for a while. But there are certainly opportunities to move into other areas with a similar need. The generic problem we solve is controlling a distributed field of something that matters. One analogous thing a telco would care about is their wireless network. They want to make sure they have the right signal strength to everyone's device, and they have discrete cell towers that control that signal strength and that use a lot of power. At a high level, that's a very similar kind of problem that we're solving inside their data centers. I don't see us going after that really soon, but the point is that there are a lot of automation problems that are similar to what we do.
TR: So what is next on the product menu for Vigilent?
CF: The system itself produces a lot of data. We're building product features now and combining them with predictive analytics. The idea is to take those analytics and turn them into visible features that help our customers solve operational and planning problems that have often nothing to do with energy. For example, risk management. Most telcos manage risk with processes. When they're going to change one of their facilities they write up a procedure, a bunch of people review it, then they go ahead and execute. What they don't do is use metrics, KPIs, to keep track of things that are known to be associated with risk. We've created KPIs that are correlated with extreme hotspots that can be tracked in real time or weekly or monthly in reports. You can engage a different kind of process that is data dependent and metric based rather than human procedure- based.
Another area is capacity planning. Capacity for anyone running a mission critical facility has 3 things that matter: space, power and cooling. Whether or not you have space is easy to figure out by inspection, power is pretty simple because circuits have a known capacity that doesn't change with time, but cooling is the one that is most uncertain. When you have uncertainty and don't have a lot of data, what you become is very conservative. That conservativeness is extremely expensive.
TR: Why has energy management for data centers been getting so much attention recently? Is it the rising costs?
CF: I think that costs have always been there, but what's happening is that the technology for managing the infrastructure can now become information based. All of that information opens up the opportunity to solve what has been a problem forever. The behaviors that telecoms have for managing this infrastructure have been in place for decades. If you don't have much data, you naturally do what is conservative. They're managing 911 switches and all kinds of really important things, so doing it in a bulletproof way is obviously the right thing to do. But now there's the opportunity to have just as much certainty and reliability but not spend as much to do it.
TR: Thank you for talking with Telecom Ramblings!