The Advent of Big data: a revolution?

by Pierrick Bouffaron, Basile Bouquet, Thomas Deschamps – Consulate of France in San Francisco

Introduction

It is difficult to deny the complexity of our modern society: globalization, continued growth of a population whose intrinsic needs are constantly increasing, and overall improvement in the standards of  living and education. In this evolutionary paradigm, social organizations (institutions, communities, businesses) are now forced to rely on innovative data analysis methods to stay responsive, adaptive, strong and accurate in carrying out their assigned tasks. The amount of data produced and processed by these actors has continuously increased. Even if this isn’t new information, the proliferation of interconnected devices and internet access to each and every moment makes the analysis of this heterogeneous information very complex. Thus, a company must be able to react quickly to shifting market needs, its deceptive signals and its evolution as well as its crises. The billions of data transmitted by consumers are a great source of information used to better target customer segments and industry trends (eg, being able to differentiate a buzz from a real market trend in the immediate future), but also to monitor the evolution of its competitors. In another context, governments are now heavily sollicited to provide quality information to the greatest number, all the while leveraging the services made ​​possible by the advent of “Information and Communication Technologies” (ICT) : welcome to the era of Big Data !

The concept of Big Data; The buzz in Silicon Valley

Definition

BigData.The expression was undoubtedly the slogan of 2011-2012 and will be the one of 2013 in the Silicon Valley. It was introduced by the research firm Gartner in 2008. It refers to the unprecedented increase in the volume of data exchanged in our societies, the heterogeneity of their nature and source. In this context, the capture, storage, retrieval, sharing, analysis and visualization of data should be reconsidered. Thus, since the explosion of telecommunications in the late 1990s, the amount of continuously exchanged information has never ceased to grow, exceeding all expectations. The massive development of the web (two billion Internet users worldwide in 2012) helped. The more recent advent of the “Internet of things” will greatly accentuate this trend (devices connected to telecommunication networks such as smartphones, computers, tablets, sensors of all kinds) Today, the merger of telecommunication networks with physical energy networks (electricity in particular) called “smart grids” opens the door to many applications: development of electric vehicles, connected electric appliances capable of demand response, renewable energy integration, interface and customer service, usage analysis, etc.

What challenges ?

The challenges of Big Data can be summarized in four V’s : Volume, Variety, Velocity and Visualization..

Volume: The challenge is daunting because the trend of “more and more data” has increased sharply due to the free fall in the cost of generic storage over the past several years, coupled with the tremendous boom in information technology and communication. Thus, Facebook hosts 40 billion photos while Walmart handles more than one million customer transactions per hour, in the United States,  feeding databases estimated at more than 2.5 petabytes (250). By 2013 the annual amount of data transferred over the web is expected to reach 667 exabytes (260). [1] But where does BigData start ? Opinions differ. According to Mike Driscoll, CEO of Metamarkets,”if your data fits into an Excel spreadsheet, you have Small Data. If a MySQL database is sufficient, it is called Medium Data. However, if your data is spread over several servers or multiple machines, you’re there : the issues to be addressed fall within Big Data “[2]!. However, Cisco raises a major problem: the volume of data exchanged increases much faster than the capacity of the network that transmits them.

New online data storage solutions are starting to emerge. Companies like Amazon, AT & T, IBM, Google, Yahoo or AppNexus multiply Cloud Computing offers and relieve companies with capacity while offering them a range of associated services. Pike Research expects the revenue growth of cloud computing to continue at a rate of about 30% per year, with an increase in market value of 46 billion in 2009 to 210 billion by 2015. [3]

2. Variety: With more and more numerous data sources (Internet, connected devices, sensors, etc.), there is a strong heterogeneity in the data collected. The need to structure information is essential.

3. Velocity:The obligation to make prompt and appropriate decisions is a key to success (and economic survival!). The support tools based on the exploitation of data must be the most efficient and effective as possible.

4. The visualization: If the amount of input information available is enormous, the result of its output processing must be clear and concise otherwise it may not be used.

Two complementary challenges have appeal to the accessibility and availability of information. The data are often scattered in silos and they may be difficult to gather to have a clear, reliable and centralized vision. As good as the analysis algorithms are, a quantitative or qualitative deficiency of data can lead to incorrect or incomplete results.

What added value?

Organizations of all kinds have become aware of the value of the data they have, and how their use can differentiate them, enabling them to improve their knowledge or improve their structural efficiency. Through more and more sophisticated analysis tools, the central idea is the processing of billions of data to find relevant information that would then allow making the right decision. For example, the data generated by a consumer is a valuable marketing source for services which can then be customized in their advertising.

The reason for analyzing this data is not confined to the business world, where the focus on performance, competitiveness and market rank is the main driver. The data collected by government agencies is also at the heart of the phenomenon. In order to have transparency for its citizens, governments are encouraged to provide them with the information they hold. State agencies are very interested in this sharing, which stimulates at the same time creativity and innovation [4]. Citizens and businesses can use the available data to create new services (mobile, web applications)

Some examples:

– The digitalization of medicine could eventually help doctors diagnose and treat patients while optimizing costs . [5]

– Institutional and public data are increasingly used to improve the functional efficiency of cities: Cisco announced in early December its association with the Startup Streetline to address the management of parking in San Francisco in real time [6]

-.Regarding industrialists, General Electric announced in early December over a billion medium-term investments in order to provide Big Data solutions to its customers.

-.Imagine the amount of data needed for the construction of an Airbus A380 or a gas turbine power plant [7]

-Service-oriented thinking is a blossoming paradigm of information technology, in conjunction with many other disciplines such as operations, accounting and finance [8]

What market ?

Information is a strategic challenge of the first order, the collection and processing of data are subject to more investment. Market management and data analysis is currently estimated at more than $100 billion and growing at nearly 10% per year, or about twice as fast as the global software market. According to Gartner Research [9], BigData will be responsible for creating 4.4 million jobs in the ICT industry worldwide by 2015, and over 1.9 million in the United States . The theme is often cited as a priority by investors in Silicon Valley. [10] This economic potential is born from an unprecedented race towards the most innovative and effective management algorithms : crucibles of innovation and knowledge (including mathematics and computer science), the California universities like UC Berkeley [11] and Stanford address these challenges. In terms of data ownership and investment potential – particularly in research, majors such as Amazon, Google and Facebook are logically the main actors of BigData. In parallel, startups flourish by offering software, platforms and services for data management surfing this philosophy such as Platfora, Continuuity or Metamarkets  all based in the Silicon Valley. [12]

Cloud Computing

Cloud computing  and Big Data are two inseparable elements. Today, IBM, Google, Yahoo or AppNexus all offer cloud computing services. The NIST(National Institute of Standards and Technology) defines cloud computing as a model allowing practical network access and on demand access to a family of computer resources shared by all (eg, servers, networks, storage, applications etc..) that can be rapidly mobilized and released using minimal management effort or interaction with the service provider [13,14]. Cloud computing enables, among others, the development of management models and optimization, distribution services, (such as “Pay-as-you-go”),storage solutions and dynamic offerings. The availability of the resources is very elastic, since the available computing power and storage space is theoretically infinite. No Cloud without transit data, no Big Data without Cloud computing.

Conclusion

Even though BigData is often presented as a powerful tool, several slip ups have citizens and public authorities concerned : difficulty in managing privacy of personal information, issues of financial and strategic data security.  Cybersecurity issues accompany the development of the concept: the public, academic and industrial debates are multiplied. Pike Research  for example predicts that the computer security market estimated at $ 370 million in 2012 will reach 610 million by 2020. [15] These challenges aside, the potential offered by the availability and accessibility of this wealth of data – at a time when the world is going digital at a high speed – remains a tremendous asset to the modern world.

Sources

– [1] The Economist (27/02/2010). Data, data everywhere, a special report on managing information. Disponible sur: http://redirectix.bulletins-electroniques.com/dBjgB
– [2] J. Ebbert (12/2012). Define It – What Is Big Data?, Adexchanger.http://www.adexchanger.com/online-advertising/big-data/
– [3] Pike Research, Cloud Computing Energy Efficiency (2012). http://redirectix.bulletins-electroniques.com/dsBx0
– [4] Bulletin Electronique Etats-Unis #309 (20/11/2012). Code for America et SF Open Data, l’innovation au service des pouvoirs publics. Disponible sur : http://www.bulletins-electroniques.com/actualites/71490.htm
– [5] B. Chaiken (2011). Mine Big Data to Advance Clinical Research Support, Docs Network. Disponible sur : http://www.docsnetwork.com/articles/BPC12209.pdf
– [6] J. St. John (12/2012). Cisco, Streetline Team Up on Smart, Networked Parking, Greentech Media. http://redirectix.bulletins-electroniques.com/8qQE0
– [7] A. Stratigos (12/2012). General Electric to Leverage Big Data, Outsell.http://redirectix.bulletins-electroniques.com/8nqNF
– [8] H. Demirkan, R.J. Kauffman, J.A. Vayghan, H.-G. Fill, D. Karagiannis, P.P. Maglio (2008). Service-oriented technology and management: perspectives on research and practice for the coming decade, The Electronic Commerce Research and Applications Journal 7 [4] 356-376.
– [9] C. Pettey (21-25/10/2012). Gartner Forecasts Global Business Intelligence Market to Grow 9.7 Percent in 2011, Gartner Research, Gartner Symposium/ITxpo 2012, Orlando.
– [10] Intervention de Vinod Khosla, “Money Matters !”, AIMS Stanford, 11 Novembre 2012.
– [11] S. Yang (03/2012). Big grant for Big Data: NSF awards $10 million to harness vast quantities of data, UC Berkeley News Center. http://newscenter.berkeley.edu/2012/03/29/nsf-big-data-grant/
– [12] B. Koehler (02/2012). 10 hot big data startups to watch this year, Beautiful Data.http://redirectix.bulletins-electroniques.com/vyywt
– [13] P. Mell, T. Grance (2011). The NIST definition of cloud computing. Disponible sur:http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
– [14] Lohier, F. (28/10/2011) Le NIST livre enfin sa définition finale du Cloud Computing. BE Etats-Unis numéro 264. Disponible sur : http://www.bulletins-electroniques.com/actualites/68056.htm
– [15] Pike Research. Industrial Control Systems Security (2012).

Disponible sur :http://redirectix.bulletins-electroniques.com/eDaUB