Big Changes for Big Data: Best Practices for Scaling Big Data Storage

August 2nd, 2013 by · 1 Comment

This Industry Viewpoint was authored by Stefan Bernbo, CEO and founder of Compuverde.  

Demand for storage has never been higher. A recent study by IDC projected that data center infrastructure will increase at a compound annual growth rate (CAGR) of 53 percent between 2011 and 2016. The popularity of cloud computing has undeniably contributed to the surge in the call for additional storage options. Consumers’ overwhelming enthusiasm for online services is motivating service providers to reconsider their Big Data storage infrastructure.

The demand for storage has grown exponentially, giving birth to new challenges that service providers must confront. Communications service providers are challenged with the task of maintaining the low-cost structure that consumers are accustomed to. The ever-increasing appeal of cloud-based online services has left service providers to look for scalable, cost-effective and performance enhancing alternatives. Without these adjustments users will be burdened with storage restrictions while service providers will be plighted by higher-costs and energy consumption.

Do the Benefits Outweigh the Costs?

In response to the growing demand for storage, communication service providers are transitioning their data centers to a centralized environment. This not only allows users to access their data remotely; it also eliminates the need for excessive equipment and personnel. Additional benefits of a single, large data center include enhanced Internet connections, improved performance and reliability.

Although there are benefits to centralizing data centers, there are also challenges that come with it. Notably, scalability becomes difficult and costly. Improving data center performance requires purchasing additional high-performance, specialized equipment, which increases expenses and energy consumption, both of which are difficult to control at scale.

Challenges of the Cloud

Service providers are burdened with managing significantly more users and greater performance demands than does the average enterprise. This makes solving performance problems such as data bottlenecks a significant concern. Although the average user of an enterprise system demands high performance, these systems host comparatively fewer users, many of which are able to access their files directly through the network. Furthermore, enterprise system users are largely accessing, sending and saving relatively low-volume files that use less storage capacity with negligible performance load.

However, the same does not hold true for cloud users outside the enterprise. Outside the internal network environment, the service provider’s cloud servers are simultaneously being accessed by vast amounts of users, which turns the Internet itself into a performance bottleneck. The cloud provider’s storage system must be able to sustain performance levels across all users while scaling to each additional user. Adding to these challenges, the average cloud user is accessing and saving much larger files than the average enterprise user, such as music or video files.

Communication service providers face significant business implications due to these storage demands. In an effort to keep up with the growing demands for increased data storage, service providers must be able to scale rapidly. In order to remain competitive, service providers need storage solutions that enhance performance, scalability and cost-effectiveness.

The Ideal Method

To achieve the optimal storage solution, service providers should consider the following best practices:

  • Use commodity components:

Low-energy hardware can make good business sense. Commodity-component servers are not only economical, but they also energy-efficient, which simultaneously reduces setup and operating costs.

  • Prevent bottlenecking:

A single point of entry can easily create a performance bottleneck, especially with the demands of cloud computing on Big Data storage. Adding caches to mitigate the bottleneck, as most service providers presently do, increases cost and intricacy in a system. Alternatively, a horizontally-scalable system that distributes data among all nodes makes it possible to choose low-cost, lower-energy hardware.

  • Distributed storage:

Despite the trend towards data center centralization, distributed storage presents the leading way to build at scale. There are now ways to upgrade performance at the software level that improve upon the performance benefit of a centralized data storage approach.

Conclusion

Current Big Data storage infrastructures consist mainly of high-performance, vertically- scalable storage systems that can only scale to a single petabyte and are quite costly. Due to these limitations this is not a sustainable solution. Service providers can seamlessly transition to a horizontally-scaled data storage model that evenly distributes data onto low-energy hardware by adhering to the best practices above. The new system will reduce costs while addressing the performance challenges of the current infrastructures. Upgrading cloud storage using these methods will allow service providers to improve the performance, scalability and efficiency of their data centers, as well as aid the efforts to keep up with increasing demand.

About the Author:

Stefan Headshot

Stefan Bernbo is the founder and CEO of Compuverde. For 20 years, Stefan has designed and built numerous enterprise scale data storage solutions designed to be cost effective for storing huge data sets. From 2004 to 2010 Stefan worked within this field for Storegate, the wide-reaching Internet based storage solution for consumer and business markets, with the highest possible availability and scalability requirements. Previously, Stefan has worked with system and software architecture on several projects with Swedish giant Ericsson, the world-leading provider of telecommunications equipment and services to mobile and fixed network operators.

If you haven't already, please take our Reader Survey! Just 3 questions to help us better understand who is reading Telecom Ramblings so we can serve you better!

Categories: Cloud Computing · Datacenter · Industry Viewpoint

Join the Discussion!

1 Comment, Add Yours!


  • DataH says:

    Nice view on Big Data Stephan. We are seeing an increase in businesses seeking specialized skills to help address challenges that arose with the era of big data. The HPCC Systems platform from LexisNexis helps to fill this gap by allowing data analysts themselves to own the complete data lifecycle. Designed by data scientists, ECL is a declarative programming language used to express data algorithms across the entire HPCC platform. Their built-in analytics libraries for Machine Learning and BI integration provide a complete integrated solution from data ingestion and data processing to data delivery. More at http://hpccsystems.com

Leave a Reply to DataH

You may Log In to post a comment, or fill in the form to post anonymously.





  • Ramblings’ Jobs

    Post a Job - Just $99/30days
  • Event Calendar