What is all this additional data that is being collected and transmitted? What is this new term “Big Data” mean? What is HADOOP? How do we move Big Data and process it? How else can we help you?
Let's Start With What Big Data Is
- Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone.
- Gartner defines Big Data as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
- According to IBM, 80% of data captured today is unstructured, from sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals, to name a few. All of this unstructured data is Big Data.
What Do We Do With Big Data?We are discovering that we can make sophisticated predictions by sorting and analyzing Big Data. But, if 80% is unstructured, we first must format it before analysis. The way to structure it is HADOOP. Some ways to get Big Data to create value are:
(1) Make information transparent so you can use it at a higher frequency.
(2) Try and obtain more accurate and detailed data on everything relating to your project.
(3) Use Big Data to segment customers, products and services.
(4) Realize that sophisticated analytics will improve decision-making.
Correlation versus causation versus “what’s good enough for the job”One of the biggest complaints — or, in some cases, proposed facts — about Big Data is that it relies more on correlation than causation. If you’re disappointed with Big Data, you’re not paying attention.
Honestly, for song or product recommendations, who really cares? But in areas like medicine, finance and even marketing, people are becoming much more concerned with finding out “why” once they’ve found out “what.” If you’re a retail store, knowing that Mac users who visit your site tend to buy more-expensive products might make you want to show them more-expensive products. Some deeper digging — perhaps even via direct questions — would show they’re really concerned with craftsmanship. The more you learn beyond what a clustering algorithm can tell you, the better you can connect with customers.
What is HADOOP?
HADOOP is an open source project from the Apache Software Foundation that provides a software framework for distributing applications on clusters of servers. Designed to handle huge amounts of data and inspired by Google's MapReduce programming model and file system, HADOOP was originally written for the Nutch search engine project. The name comes from a favorite stuffed elephant of the son of the developer Doug Cutting. But don't confuse it with HADOPI: the French law that regulates downloading.
Technologies like HADOOP, for example, aren’t designed to write better models for you — they’re designed to process a lot more data a lot faster. If your models still work, HADOOP should help you run them better against a much larger dataset. That might lead to more accurate models and faster answers, but it won’t necessarily lead to a “World-shaking” break through.
So how does all this fit into EDI?
Recently we spoke with Rob Guerriere of DataTrans Solutions. Many of us are assuming that EDI's definition is directly tied to "structured data" and B2B is a more widely used term that could really mean anything that is transacted between two businesses which may include EDI. B2B is such a loosely used term in business that goes so far outside the realm of our industry and space that to me it is a high level marketing term. But when working with the supply chain teams, six sigma, advance degree leaders at today's top corporations, they are using the term EDI in a way that does not directly translated to ANSI X12. The old, "EDI is dead" has resurrected using the same internet technology that was going to kill it. EDI has evolved. I might go as far as to say that MFT (Managed File Transfer)is just one simple part of EDI. MFT is simply a transmission pipe.
Rob mentioned, "They are wrapping a lot more under the term EDI" and then define EDI as, "Today, EDI encompasses all forms of B2B data communication, conversion, integration, collaboration, validation, translation, forecasting, government compliance, security, and the business processes that utilize these transactions.”
Big data: The next frontier for innovation, competition, and productivity
The amount of data in our world has been exploding, and analyzing large data sets—so-called Big Data—will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, according to research by MGI and McKinsey's Business Technology Office. Leaders in every sector will have to grapple with the implications of Big Data, not just a few data-oriented managers. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future.
Finally, for those of you in the career hunting mode, there will be a shortage of talent necessary for organizations to take advantage of Big Data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of Big Data to make effective decisions.