Today we dive back into the nuts and bolts of internet infrastructure and take a look at an upstart in network monitoring and analysis. Kentik was founded last year with an eye toward leveraging cloud computing and big data to better observe and operate networks themselves. With us today to tell us what they’ve been doing and how it’s been going is Founder and CEO Avi Freedman.
TR: How did Kentik get started?
AF: After selling the ISP I started back in 1992, I ran engineering for the global backbone of AboveNet. I came to Network World, and saw a very interesting customer of ours, Akamai, who actually had better instrumentation about our network than we did. I was fascinated by that, went and talked to them, and and wound up joining Akamai in 1999. After I left about 6 years ago, I started looking at the network instrumentation/monitoring space. Everyone that I talked to was unhappy with the tools that they had, and the devices had gotten to the point that they could export the call records of the network but the tools were trapped in the 90s. We started Kentik last January to address that.
TR: Why do you think existing tools have fallen behind?
AF: Generally speaking, in the late 90s big web companies started to understand and make a science out of clustering, and invented what we now call big data systems. They understood that there's a limit to what one computer can do. The vendors in the network monitoring space have generally started before the modern era of scaling big data services. I think that's why in their architecture you tend to see a lot of data aggregation, because ultimately everything has to run on one computer. That becomes a scaling limit, and they don't even try to store data at a detailed level. Secondly, a lot of people assumed that it would be difficult to do what we do as a SaaS offering. But in the last 5-6 years, it’s become a natural assumption that as you build a tool suite for modern operations, it's not only acceptable but preferred to have a SaaS offering. So 80% of the business we do comes from people pointing their operations data to us on the cloud. We started out with customers who didn’t want to deploy 20 servers to get operations visibility, and would much rather someone else run the infrastructure. Especially SaaS companies themselves, who prefer to use a SaaS offering.
TR: What approach did you take?
AF: We took a big data SaaS approach, so we have our own data platform, and operate a public service, though we also do some on-premises deployments for certain customers. We store all the data at the resolution we get it, and let people look back into it with high resolution. The historical approach taken by our competitors is to take the netflow data, which for a big carrier class router will be 10-50Tb of data over 90 day retention, and aggregate it. So when you are actually looking at an event and you want to see what happened in the last few minutes, you can't unless you told it in advance you wanted it to memorize that way. Our approach was to remove that limitation and give you an independent lookback engine. So in order to be able to do that, we build a data backend and run that on top of our platform, which is why it's not just a VM someone can download and install.
TR: How much traction have you found so far?
AF: We spent last year building the technology and raising the funding. We started having commercial customers in February, and we launched the platform formally in June of this year. We were 4 people this time last year, we are 29 people now, but we are trying not to go crazy in current market conditions but rather grow prudently. Our customers are network providers from the global backbones to cloud companies to ISPs, as well as companies like Neustar, OpenDNS, Yelp, Box, and to some extent classic enterprises where they've got either a global network to manage or are talking to a CDN. For all of these companies, the network makes revenue flow, so making it flow well is the problem they need to solve.
TR: How does a startup break into this part of the infrastructure market? How do you get the attention of customers?
AF: It's about half connections we have from the past, about a quarter inbound from the community and a quarter from introductions from investors and other folks. We're little, iconoclastic, and a lot of the web companies we are dealing with are moving towards building their own infrastructure where it touches revenue generation. These people are all actively looking at solutions. A lot of customers we work with started building their own because they didn't like the solutions out there. So when we've come along, it's been at a good time in the conversation. We usually don't have to explain the problem.
TR: How does a network operator go about implementing something like this? Is there any CPE involved?
AF: There are a couple of ways to do it. If you go online now, our cloud will give you the config to type into your routers which will send us the metadata and you can see in 5 minutes what's going on with your infrastructure. We have cases where we set up a demo and do a trial on the same day. There are no appliances that we sell on premises, unless the customer says they want this whole solution to be on their network. Then we install a clustered system because it's a scale out architecture.
TR: How does the network data make it to your big data platform?
AF: It can go over the public internet. For people who are concerned about the privacy of the data, we have software agents they can deploy anywhere on their infrastructure that will gather the data locally and send it to us via SSL so it's encrypted in flight to our public cloud. But there's a third way you can do it. We take cross connects at Equinix, and a number of customers do cross connect to us there.
TR: How much traffic does this generate?
AF: We've got a couple of Tier 1 networks on our platform that do low terabits of traffic on their network. Usually the traffic to us winds up being a tenth of one percent on their networks, so maybe a gigabit for our biggest customers. By monitoring standards your home network should have as much bandwidth is needed.
TR: Do you use public clouds to do your data processing? Or have you rolled your own cloud infrastructure?
AF: Economics led us to not use the public cloud like Google services or AWS Redshift. We'd lose money if we were using any of those services. We use our own 24 disk servers in Equinix and have our own provisioning system and network with 100G of connectivity. If you assume you're going to have to deal with scale (and we started with a couple of petabytes) then you can get the economics of building it for yourself.
TR: It’s one thing to collect data that’s already available, but is there data you think networks ought to be collecting and leveraging that they aren’t yet?
AF: Absolutely. I did a presentation recently at NANOG about augmented sources of data that can give an augmented view of the network. Some of the network elements that people have can already give enhanced flow data, but for some it is necessary to look outside the network elements and see where to get additional data, from the packet brokers, like Ixia and Gigamon, who will be capturing URL SQL query, DNS query, application and network performance data; to load balancers and weblogs that also have URL and application-network performance data.
TR: What is the most common usage by network operators?
AF: The number one thing is if you look at an SNMP site view, network providers often can see that links are full or congested, which can be an application problem or a need for more capacity. About 90% of the time there is service impairment on a network, meaning it's actually a configuration change or a misunderstood traffic dynamic of how applications are talking to each other, and not an external effect. A lot of times, they are really flying the network blind without a dashboard and don't know why these things are happening. So the first thing they can see in the tool is a breakdown of what is generating the traffic on their network. It's about efficiency and availability.
TR: What’s on the drawing board for future versions?
AF: Because we are a SaaS company, we have visibility into not just what's happening to you as an individual customer, but also your comparables on the global internet. So another differentiated thing we are working on with customers now is global context. For performance, availability, security, what's happening to you versus what's happening to other folks on the net, which is something that can only be done as a SaaS company. For instance, if your network is getting a lot of traffic from a network in Brazil, we will be able to show you whether that is happening to everybody, and given that, does it appear to be legitimate traffic. Or, on the performance side, if you're having trouble getting to a particular cable company but so is everyone else, then that allows the practitioner to say, "I have an issue, but there's nothing I can do about it, let me move on".
TR: Thank you for talking with Telecom Ramblings!