This Industry Viewpoint was authored by Dan Joe Barry, president of marketing, Napatech.
Real-Time Big Data Analytics (RTBDA) has become a predominant topic among big data discussions as it speaks to the key value propositions of big data analytics. Many Internet/Over-The-Top (OTT) companies such as Amazon and Google use it as their strategic foundation to make decisions in real time based on the analysis of available information.
These OTT players are a source of inspiration and frustration to telecom carriers that must come to grips with the increasing amount of traffic generated across the network with little or no revenue contribution.
In this article, we will take a closer look at RTBDA, particularly in the framework of telecom networks. Although the technologies required to implement such a strategy are used today, they are not as effective as they could be.
So, What is RTBDA?
In its most basic form, big data analytics is composed of two parts that distinguish it from data warehousing and mining or business intelligence:
- Distributed, parallel processing
- The ability to act in real time
Big data analytics addresses the challenge of processing large unrelated data sets that typically cannot be accommodated by a single database or server. One solution to address this issue is the use of distributed, parallel processing where large data sets are distributed to multiple servers, whereby each server processes a part of the data set, in parallel. Big data analytics can work with both structured and unstructured data. For example, the ability to use Hadoop with MapReduce is one approach and can be credited as a driving force behind the current interest in big data.
What is unique to the big data solution when processing large data sets is that it is expected to be completed within a defined time frame.
Although RTBDA is considered new, it addresses the need to produce results in real time. This is motivated by Internet content and the service providers’ capabilities to understand what is happening on the network at any given moment.
“Real time” for Telecom
Defining “real time” depends on the framework of your goals and the environment in which you are working. Seconds and microseconds may be enough for some, whereas others require real-time to be faster.
From a telecom point of view, this is an interesting question. It exposes a possible problem with current practices in telecom that must be tackled if carriers are to succeed in overcoming the challenges that OTT traffic present. The current definition of “real time” may no longer be relevant.
Previously, telecom networks were established on connection-oriented technology. Protocols and changes could only be applied centrally in a highly structured process; and the network did not change very much from one minute to the next.
Given these circumstances, it was sufficient to gather information from the network at regular intervals to know what was happening. The protocols used were filled with management information, allowing a great deal of insight to be gathered from just one protocol header. Here, “real time” can be defined in seconds or even minutes, which is why collecting Call Detail Records (CDRs) every 5 to 15 minutes was sufficient.
However, times have changed. With the migration to LTE, telecom carriers have completely transitioned to packet networks based on Ethernet and IP which function in vastly different ways compared to connection-oriented technologies and protocols.
To start, IP networks are self-sufficient. The network directs the traffic and reroutes the path depending on the congestion and other conditions, allowing the network to react to changes swiftly. The downside is that you cannot predict with certainty where traffic will be flowing.
That challenge is not made easier by the fact that Ethernet and IP, by design, do not contain the same level of management information overhead that connection-oriented protocols provide.
Packet networks are by nature bursty and dynamic. They are designed to support multiple users and services, all sharing the same infrastructure. Over extended periods of time, utilization of the network can appear quite low, but the reality is that traffic is transmitted in bursts, which can consume the entire available bandwidth. In these situations, the IP network is expected to make sure the traffic is routed in a balanced way through the network. This allows for changes to occur in the network from one IP packet or Ethernet frame to the next.
The main obstacle with how telecom network management and data analytics are being performed today is that they both rely on CDRs, Event Detail Records (EDRs) and IP Detail Records (IPDRs) to understand what is occurring in real time.
This definition of “real time” is fixated in a time when gathering data every few minutes was adequate. Considering that Ethernet frames in a 10 Gbps network can be transmitted with as little as 67 nanoseconds between each frame, we begin to understand what “real time” means in a packet network. It is not minutes; it is not even seconds. It is nanoseconds.
Reacting in Real-Time
Depending on your goals, using CDRs, EDRs and IPDRs for big data analytics is a good idea. Big data analytics can be used for two broad categories of decision making.
- Real-time decision making
- Better planning and optimization of services and networks based on trends and predictive analysis
Using detailed records for better planning and optimization along with other structured and unstructured data sources is appropriate and valuable. These records host a wealth of information that can generate useful trends and predictions based on this data. However, unless coupled by real-time information from packet networks, this information will never provide a complete picture of what happened and when.
Simply put, detailed records cannot be used for real-time decision-making, since they are only collected every 5-15 minutes. This is not compatible with our definition of real-time in packet networks. For exact real-time decision making, it is necessary to continuously collect, store and analyze network information. All the relevant Ethernet frames and IP packets need to be examined in real time in order to grasp what is happening.
By gathering network information in this way, we are not only able to analyze and react to this information in real time but we also can use this data as a source of detailed, reliable information on what and when an event happened in the network to coincide other big data analytic activities.
Instituting RTBDA in Telecom
The real-time data collection layer can act as a constant stream of pertinent information for decision-making. Both the TM Forum and the IP Network Monitoring for Quality of Service Intelligent Support (IPNQSIS) project, part of the European CELTIC-Plus program, have researched this need as part of their respective work on customer experience management. Both projects arrived at similar conclusions, that probes and appliances are necessary to facilitate dependable, real-time insight into what is happening in the network.
Traditionally, probes are data collectors that deliver information to other management systems. Appliances, on the other hand, utilize the same technology but additionally analyze and store the information locally. Typically, appliances focus and fulfill specific tasks, such as performance monitoring, test and measurement or security. But, probes and appliances can also be used more tactically as sources of real-time data for big data analytics and as implementations of RTBDA strategies. Below outlines how such an infrastructure could be implemented.
The initial step involves deployment of appliances for data collection. All the Ethernet frames and IP packets need to be captured, in real time, at line speed with zero packet loss, no matter the conditions, in order for this to work. This guarantees a reliable stream of information is being gathered.
It is imperative that each and every frame is given an exclusive time stamp, so that a precise timeline can be established, not only local to the appliance, but also across various appliances. The accuracy of these time stamps must be within nanoseconds. For example, with only 67 nanoseconds between Ethernet frames in a 10 Gbps network, the time stamp resolution must be better than 67 nanoseconds. Otherwise two Ethernet frames would receive identical time stamps, making it impossible to distinguish which came first. This time span reduces to 6.7 nanoseconds in a 100 Gbps network.
Zero packet loss capture, combined with nanosecond precision time stamping, ensures that we have a reliable, accurate stream of data analysis information.
Second, is storing this information in real time. Several appliances offer capture to disk, which enables real-time data to be stored to a native hard disk on the appliance. Otherwise, this data can be forwarded to a Storage Area Network (SAN) or alternative location. An historical timeline can be built using the stored data, tracking the precise details of what has happened in the network.
This timeline is a source of rich information for data analytics. Such data can provide insight into usage and behavior trends. If the appliance has Deep Packet Inspection (DPI) capabilities, then usage of services, including OTT services, can be tracked and analyzed to provide usage patterns with respect to time, location and type of device.
This alone is a valuable resource for network and service optimization. New, compelling services can be defined that match users’ preferences. Even more importantly, this information can be used to provide insight to OTT content service providers, so that carriers can offer attractive services to these potential customers.
Reacting in Real-Time
Finally, there is potential to use real-time and stored data to facilitate real-time decision-making. The data captured to disk can help create a profile of expected behavior. When data is associated with the real-time information on network activity, it can likely detect unexpected events or anomalies. These issues can be a security threat, performance degradation or an opportunity to offer a customer a package extension or a complementary service.
From a RTBDA perspective, this is very close to the types of abilities that OTT content and service providers have deployed. The ability to react in real time, based on an understanding of what is happening right now and comparing it to what has happened previously.
Redefining RTBDA in Telecom
Modern telecom networks need to re-evaluate what “real-time” means and what resources are used for big data analytics. Although telecom carriers already have probe and appliance technology in their network, they must consider using them in a more strategic way to support RTBDA. Doing so will not only provide an enhanced source of information for planning, but will also create new opportunities to offer improved services, not only to end users, but also to OTT service providers, allowing them to finally address the issue of monetizing OTT traffic in telecom networks.
About the Author
Daniel Joseph Barry is VP of Marketing at Napatech and has over 20 years experience in the IT and Telecom industry. Prior to joining Napatech in 2009, Dan Joe was Marketing Director at TPACK, a leading supplier of transport chip solutions to the Telecom sector. From 2001 to 2005, he was Director of Sales and Business Development at optical component vendor NKT Integration (now Ignis Photonyx) following various positions in product development, business development and product management at Ericsson. Dan Joe joined Ericsson in 1995 from a position in the R&D department of Jutland Telecom (now TDC). He has an MBA and a BSc degree in Electronic Engineering from Trinity College Dublin.Industry Viewpoint · Internet Traffic