#BigData Terms You Need to Know

Jen Cohen Crompton

Understanding the world of Big Data sounds like a big challenge – and it can be if you aren’t “in the know” about some key terms used in the industry. Below are some top terms that will help you understand the big data conversation and come to a better understanding of how the terms contribute to the big [data] picture.

Top Big Data Terms to Understand the Concept

1. MapReduceA programming paradigm that supports distributed computing on large datasets.
MapReduce is a programming model that was created (and is used) by Google in the early 2000s to process massive amounts of data. It’s name comes from the common functions in programming known as map and reduce, but they serve different functions than the traditional definitions. It is important to understand the concept because MapReduce technologies are responsible for decentralizing data storage and processing to increase the speed and reliability of dealing with large data sets. A popular free implementation is Apache Hadoop.

2. NoSQL DatabasesDocument-oriented databases using a key/value interface rather than SQL (does not use the relational database management system – RDBMS) to classify and organize; created to manage volumes of data that do not have a fixed schema.
NoSQL gained popularity as major companies adopted the system due to an overload of data, which could not use the traditional RDBMS solutions. NoSQL databases  provide quick, efficient performance because captured data is quickly stored using a single identifying key and therefore, can quickly store a lot of transactions.

3. StorageThe technologies that hold the distributed data.
Data can be stored using data centers, which could include any number of various cloud technologies.

4. ServersTechnologies available for renting computing power on remote machines; a program to “serve” the requests of programs.
In big data, servers offer support for data storage and management.

5. ProcessingThe action of extracting valuable information from large datasets.
Processing allows the user to sort through the massive amounts of data and produce information for analysis. While processing, data can be sorted and grouped based on algorithms, but it’s important to understand the limitations and constraints without applying human thought evaluation.

6. Natural Language ProcessingExtracting information from human-created text.
This type of processing requires sorting through data that is created from humans and not necessarily from their “actions.For instance, if you are analyzing Twitter data from the previous six month, you might be looking for keywords and sentiment, which would require National Language Processing.

7. VisualizationViewing graphically represented meaningful data.
As data is collected, stored and then analyzed, it needs to be presented in a way that it can be understood and digested. Programs able to analyze big data can sometimes interpret the data and represent it in a visual display for easier consumption and/or to show results.

8. AcquisitionTechniques for cleaning up messy public data sources.
As data is collected, it is not always in its purest form and/or usable. There are various sources that help take this data and turn it into something that can be processed. 

9. SerializationConverting data structure or object state into a format able to be stored.
Serialization occurs after the data is collected and when it is being processed. As the data gets sorted and pushed around between systems, it may need to be stored. During these steps, the data will require serialization and it will be based on the different languages and APIs.

10. CPUCentralized processing unit; the hardware within a computer, which performs the basic operations of the system (comparable to the “brain”).
CPUs are mentioned in reference to crunching the data.

11. HadoopOpen source implementation (Apache software product).
Hadoop was developed to enable applications to work with thousands of computational independent computers and petabytes of data.

12. Petabytes, exabytes, zettabytesUnits of information used to measure data amounts.
Petabytes = one quadrillion (short scale) bytes, or 1024 terabytes
Exabytes = one quintillion bytes (short scale)
Zettabytes = one sextillion (one long scale trilliard) bytes

source: wikipedia.org

 

Comments

About Jen Cohen Crompton

Jen Cohen Crompton is a SAP Blogging Correspondent reporting on big data, cloud computing, enterprise mobility, analytics, sports and tech, and anything else innovation-related. When she's not blogging, she can be caught marketing, using social media and/or presenting at conferences around the world. Disclosure: Jen is being compensated by SAP to produce a series of articles on the innovation topics covered on this site. The opinions reflected here are her own.

How Big Data Can Tell You Which Book To Read Next

JP George

If you enjoy reading, but still haven’t foundyour next book to cozy up with, your smartphone might be able to suggest one. Artificial intelligence (AI) is now able to rank literature to predict the next bestseller – a kind of recommendation system, not based on metadata, but on the patterns and themes found in books.

Publishers around the globe are mining all kinds of data, including what’s in the books themselves, in search of the magic formula for evaluating a book’s market potential. With more informed marketing, publishers hope to better target their customers.

Recommending the popular novel

So, how does AI determine what we want to read? It turns out that certain emotional patterns keep us engaged and interested while reading a novel. Kurt Vonnegut first described the curves of emotional plotlines in 1995. Now, with the help of AI sentiment and emotion analysis, such plotlines can be extracted quantitatively. By combining these plotline curves, researchers from the Stanford Literary Lab claim to be able to detect the next blockbuster novel.

Machines think from data

Under the hood of such an AI sits Big Data and machine learning (ML). The concept of Big Data doesn’t just mean lots of data, but also that the data comes from many different data sources and types (e.g., audio, video, images, text, etc.) that are often unstructured (unlike traditional databases with well-defined fields). ML involves statistical algorithms that utilize sets of multi-type, unstructured data to predict class membership. This is possible by either knowing ahead of time which classes exist and training the ML algorithm by example (supervised learning) or letting the algorithm discover the underlying patterns (unsupervised learning).

ML methods include embedded vector space techniques (principal component analysis, K-nearest neighbor, and support vector machine), decision-tree based techniques (classification and regression tree, random forest), gradient and Bayesian-based methods, artificial neural networks (ANN), and others. Many tutorials on machine learning methods can be found here.

ANNs were among the first algorithms to be applied to solve problems in AI, beginning as long ago as the 1940s. For many reasons, their use has waxed and waned over the years, yet interest has recently resurged along with the unprecedented advance of deep learning. This growth in deep learning has lead to what the New York Times calls the great awakening, given Google’s ability to translate text into more than 100 languages.

How AI uncovers sentiment and emotions from text

Imagine automatically extracting the sentiment or emotional impact of a literary work. For a computer to understand a text, what is called natural language processing (NLP), AI algorithms first find a mathematical representation that a machine can understand and that contains maximal information about the text. A simple representation called “bag-of-words” (as the name implies) is a collection of words that appear together, but with no other particular nexus, from which the frequency of word groups could be ascertained. This may provide enough information for classifying themes, but would fail miserably at understanding sentences if word order is important.

Two representations that can quantify information associated with sentence word order are Word2Vec and GloVe. More about NLP representations can be learned from this tutorial, while a tutorial from TensorFlow on Word2vec is found here.

Once sentences are converted to a meaningful representation, a language model is needed that discerns positive emotions from negative emotions. One method would be to use a supervised learning procedure with deep neural networks, as has been done to understand movie reviews. Another way is to allow the deep neural network to discover the emotional patterns by itself. This is the true power behind deep learning: its ability to teach itself, and with more Big Data, to learn more.

Through this process, the ML can understand at text’s major themes (from the word groupings) and emotion. These factors are the fundamental ingredients for an AI application that will recommend a novel.

From creating Animal Farm summaries to discovering who will be the next Danielle Steel, AI is revolutionizing what and how we will read in the future.

For more on using ML to upend the competition, see Why Machine Learning and Why Now?

Comments

About JP George

JP George grew up in a small town in Washington. After receiving a Master's degree in Public Relations, JP has worked in a variety of positions, from agencies to corporations all across the globe. Experience has made JP an expert in topics relating to leadership, talent management, and organizational business.

Machine Learning With Heart: How Sentiment Analysis Can Help Your Customers

Lance Hughes

When you think of artificial intelligence (AI), the word “emotion” doesn’t typically come to mind. But there’s an entire field of research using AI to understand emotional responses to news, product experiences, movies, restaurants, and more. It’s known as sentiment analysis, or emotion AI, and it involves analyzing views – positive, negative, or neutral – from written text to understand and gauge reactions.

Sentiment analysis can be used for survey research, social media analyses, and tracking psychological trends. Picture software that scans articles, reviews, ratings, and social media posts to determine sentiment changes for hotel guests. Hoteliers will, for example, aggregate and assess ratings and reviews in effort to improve guest satisfaction.

The tech behind sentiment analysis involves natural language processing or linguistic algorithms that assign values to positive, negative, or neutral text (converting opinions into datasets), while machine learning processes the datasets to reveal relevant trends over time. There’s significant planning required: How do you ensure the algorithms capture useful information? Are you identifying the right phrases to analyze? How can you convert findings into better products, services, and experiences?

At Concur, for instance, sentiment analysis has provided invaluable insights. Recently, Concur Labs and Concur UX Analytics developed a sentiment analysis tool for user product reviews. This tool automatically extracts themes to determine how customers feel about Concur’s service and helps identify which features people like most and which ones they find frustrating.

Emotion gauging is complicated

If we could categorize responses with just one emoji, that would easy. But humans are far more complicated and fascinating. This complexity applies to sentiment analysis. For example, comments like: “The film was very good,” are easy to analyze. But it gets a little harder when you add negation: “The film wasn’t bad.” It gets much harder when you add terms that would normally come across as positive but are actually negative based on context. For instance, “I wish this film was good. There were great many things it could have done right but didn’t.”

As a relatively new field, approaches are varied and maturing. Analysis has been traditionally conducted by taking what’s called a “bag of words” approach. Basically creating a list of all the words used along with how many times they were used. With this method, word order is thrown out the window. So “not bad” would come out as negative. Modern methods use recurrent neural networks called LSTMs (long short-term memory) to compress the entire sentence into a vector (a list of numbers) that encapsulates the meaning of the sentence, taking word order into account. This tends to have higher accuracy.

For businesses invested in customers, analyzing each piece of feedback by hand can be overwhelming. Sentiment analysis, developed within context, can help catch issues early and provide guidance on how to improve services. The related machine learning algorithms can take vast amounts of data; learn and perform specific tasks quickly; and sift through data based on your priorities. As the technology advances, businesses can benefit from these in-depth insights and customer satisfaction will surely follow suit.

Learn more about marketing in an increasingly data-driven era. Read about Influencing Customers Through Infinite Personalization.

Comments

Lance Hughes

About Lance Hughes

Lance Hughes is a principal creative technologist for Concur Labs. With a background in machine learning and mobile development, he helped design and develop multiple top-selling apps while working at Smashing Ideas and Sweet Action Games, a company he founded. When he's out of the office, Lance enjoys composing music, hosting deep learning meetups, spending time with his family, and exploring augmented and virtual reality.

The Future Will Be Co-Created

Dan Wellers and Timo Elliott

 

Just 3% of companies have completed enterprise digital transformation projects.
92% of those companies have significantly improved or transformed customer engagement.
81% of business executives say platforms will reshape industries into interconnected ecosystems.
More than half of large enterprises (80% of the Global 500) will join industry platforms by 2018.

Link to Sources


Redefining Customer Experience

Many business leaders think of the customer journey or experience as the interaction an individual or business has with their firm.

But the business value of the future will exist in the much broader, end-to-end experiences of a customer—the experience of travel, for example, or healthcare management or mobility. Individual companies alone, even with their existing supplier networks, lack the capacity to transform these comprehensive experiences.


A Network Effect

Rather than go it alone, companies will develop deep collaborative relationships across industries—even with their customers—to create powerful ecosystems that multiply the breadth and depth of the products, services, and experiences they can deliver. Digital native companies like Baidu and Uber have embraced ecosystem thinking from their early days. But forward-looking legacy companies are beginning to take the approach.

Solutions could include:

  • Packaging provider Weig has integrated partners into production with customers co-inventing custom materials.
  • China’s Ping An insurance company is aggressively expanding beyond its sector with a digital platform to help customers manage their healthcare experience.
  • British roadside assistance provider RAC is delivering a predictive breakdown service for drivers by acquiring and partnering with high-tech companies.

What Color Is Your Ecosystem?

Abandoning long-held notions of business value creation in favor of an ecosystem approach requires new tactics and strategies. Companies can:

1.  Dispassionately map the end-to-end customer experience, including those pieces outside company control.

2.  Employ future planning tactics, such as scenario planning, to examine how that experience might evolve.

3.  Identify organizations in that experience ecosystem with whom you might co-innovate.

4.  Embrace technologies that foster secure collaboration and joint innovation around delivery of experiences, such as cloud computing, APIs, and micro-services.

5.  Hire, train for, and reward creativity, innovation, and customer-centricity.


Evolve or Be Commoditized

Some companies will remain in their traditional industry boxes, churning out products and services in isolation. But they will be commodity players reaping commensurate returns. Companies that want to remain competitive will seek out their new ecosystem or get left out in the cold.


Download the executive brief The Future Will be Co-Created.


Read the full article The Future Belongs to Industry-Busting Ecosystems.

Turn insight into action, make better decisions, and transform your business.  Learn how.

Comments

About Dan Wellers

Dan Wellers is founder and leader of Digital Futures at SAP, a strategic insights and thought leadership discipline that explores how digital technologies drive exponential change in business and society.

About Timo Elliott

Timo Elliott is an Innovation Evangelist for SAP and a passionate advocate of innovation, digital business, analytics, and artificial intelligence. He was the eighth employee of BusinessObjects and for the last 25 years he has worked closely with SAP customers around the world on new technology directions and their impact on real-world organizations. His articles have appeared in articles such as Harvard Business Review, Forbes, ZDNet, The Guardian, and Digitalist Magazine. He has worked in the UK, Hong Kong, New Zealand, and Silicon Valley, and currently lives in Paris, France. He has a degree in Econometrics and a patent in mobile analytics. 

Tags:

Blockchain: Much Ado About Nothing? How Very Wrong!

Juergen Roehricht

Let me start with a quote from McKinsey, that in my view hits the nail right on the head:

“No matter what the context, there’s a strong possibility that blockchain will affect your business. The very big question is when.”

Now, in the industries that I cover in my role as general manager and innovation lead for travel and transportation/cargo, engineering, construction and operations, professional services, and media, I engage with many different digital leaders on a regular basis. We are having visionary conversations about the impact of digital technologies and digital transformation on business models and business processes and the way companies address them. Many topics are at different stages of the hype cycle, but the one that definitely stands out is blockchain as a new enabling technology in the enterprise space.

Just a few weeks ago, a customer said to me: “My board is all about blockchain, but I don’t get what the excitement is about – isn’t this just about Bitcoin and a cryptocurrency?”

I can totally understand his confusion. I’ve been talking to many blockchain experts who know that it will have a big impact on many industries and the related business communities. But even they are uncertain about the where, how, and when, and about the strategy on how to deal with it. The reason is that we often look at it from a technology point of view. This is a common mistake, as the starting point should be the business problem and the business issue or process that you want to solve or create.

In my many interactions with Torsten Zube, vice president and blockchain lead at the SAP Innovation Center Network (ICN) in Potsdam, Germany, he has made it very clear that it’s mandatory to “start by identifying the real business problem and then … figure out how blockchain can add value.” This is the right approach.

What we really need to do is provide guidance for our customers to enable them to bring this into the context of their business in order to understand and define valuable use cases for blockchain. We need to use design thinking or other creative strategies to identify the relevant fields for a particular company. We must work with our customers and review their processes and business models to determine which key blockchain aspects, such as provenance and trust, are crucial elements in their industry. This way, we can identify use cases in which blockchain will benefit their business and make their company more successful.

My highly regarded colleague Ulrich Scholl, who is responsible for externalizing the latest industry innovations, especially blockchain, in our SAP Industries organization, recently said: “These kinds of use cases are often not evident, as blockchain capabilities sometimes provide minor but crucial elements when used in combination with other enabling technologies such as IoT and machine learning.” In one recent and very interesting customer case from the autonomous province of South Tyrol, Italy, blockchain was one of various cloud platform services required to make this scenario happen.

How to identify “blockchainable” processes and business topics (value drivers)

To understand the true value and impact of blockchain, we need to keep in mind that a verified transaction can involve any kind of digital asset such as cryptocurrency, contracts, and records (for instance, assets can be tangible equipment or digital media). While blockchain can be used for many different scenarios, some don’t need blockchain technology because they could be handled by a simple ledger, managed and owned by the company, or have such a large volume of data that a distributed ledger cannot support it. Blockchain would not the right solution for these scenarios.

Here are some common factors that can help identify potential blockchain use cases:

  • Multiparty collaboration: Are many different parties, and not just one, involved in the process or scenario, but one party dominates everything? For example, a company with many parties in the ecosystem that are all connected to it but not in a network or more decentralized structure.
  • Process optimization: Will blockchain massively improve a process that today is performed manually, involves multiple parties, needs to be digitized, and is very cumbersome to manage or be part of?
  • Transparency and auditability: Is it important to offer each party transparency (e.g., on the origin, delivery, geolocation, and hand-overs) and auditable steps? (e.g., How can I be sure that the wine in my bottle really is from Bordeaux?)
  • Risk and fraud minimization: Does it help (or is there a need) to minimize risk and fraud for each party, or at least for most of them in the chain? (e.g., A company might want to know if its goods have suffered any shocks in transit or whether the predefined route was not followed.)

Connecting blockchain with the Internet of Things

This is where blockchain’s value can be increased and automated. Just think about a blockchain that is not just maintained or simply added by a human, but automatically acquires different signals from sensors, such as geolocation, temperature, shock, usage hours, alerts, etc. One that knows when a payment or any kind of money transfer has been made, a delivery has been received or arrived at its destination, or a digital asset has been downloaded from the Internet. The relevant automated actions or signals are then recorded in the distributed ledger/blockchain.

Of course, given the massive amount of data that is created by those sensors, automated signals, and data streams, it is imperative that only the very few pieces of data coming from a signal that are relevant for a specific business process or transaction be stored in a blockchain. By recording non-relevant data in a blockchain, we would soon hit data size and performance issues.

Ideas to ignite thinking in specific industries

  • The digital, “blockchained” physical asset (asset lifecycle management): No matter whether you build, use, or maintain an asset, such as a machine, a piece of equipment, a turbine, or a whole aircraft, a blockchain transaction (genesis block) can be created when the asset is created. The blockchain will contain all the contracts and information for the asset as a whole and its parts. In this scenario, an entry is made in the blockchain every time an asset is: sold; maintained by the producer or owner’s maintenance team; audited by a third-party auditor; has malfunctioning parts; sends or receives information from sensors; meets specific thresholds; has spare parts built in; requires a change to the purpose or the capability of the assets due to age or usage duration; receives (or doesn’t receive) payments; etc.
  • The delivery chain, bill of lading: In today’s world, shipping freight from A to B involves lots of manual steps. For example, a carrier receives a booking from a shipper or forwarder, confirms it, and, before the document cut-off time, receives the shipping instructions describing the content and how the master bill of lading should be created. The carrier creates the original bill of lading and hands it over to the ordering party (the current owner of the cargo). Today, that original paper-based bill of lading is required for the freight (the container) to be picked up at the destination (the port of discharge). Imagine if we could do this as a blockchain transaction and by forwarding a PDF by email. There would be one transaction at the beginning, when the shipping carrier creates the bill of lading. Then there would be look-ups, e.g., by the import and release processing clerk of the shipper at the port of discharge and the new owner of the cargo at the destination. Then another transaction could document that the container had been handed over.

The future

I personally believe in the massive transformative power of blockchain, even though we are just at the very beginning. This transformation will be achieved by looking at larger networks with many participants that all have a nearly equal part in a process. Today, many blockchain ideas still have a more centralistic approach, in which one company has a more prominent role than the (many) others and often is “managing” this blockchain/distributed ledger-supported process/approach.

But think about the delivery scenario today, where goods are shipped from one door or company to another door or company, across many parties in the delivery chain: from the shipper/producer via the third-party logistics service provider and/or freight forwarder; to the companies doing the actual transport, like vessels, trucks, aircraft, trains, cars, ferries, and so on; to the final destination/receiver. And all of this happens across many countries, many borders, many handovers, customs, etc., and involves a lot of paperwork, across all constituents.

“Blockchaining” this will be truly transformational. But it will need all constituents in the process or network to participate, even if they have different interests, and to agree on basic principles and an approach.

As Torsten Zube put it, I am not a “blockchain extremist” nor a denier that believes this is just a hype, but a realist open to embracing a new technology in order to change our processes for our collective benefit.

Turn insight into action, make better decisions, and transform your business. Learn how.

Comments

Juergen Roehricht

About Juergen Roehricht

Juergen Roehricht is General Manager of Services Industries and Innovation Lead of the Middle and Eastern Europe region for SAP. The industries he covers include travel and transportation; professional services; media; and engineering, construction and operations. Besides managing the business in those segments, Juergen is focused on supporting innovation and digital transformation strategies of SAP customers. With more than 20 years of experience in IT, he stays up to date on the leading edge of innovation, pioneering and bringing new technologies to market and providing thought leadership. He has published several articles and books, including Collaborative Business and The Multi-Channel Company.