Understanding the world of Big Data sounds like a big challenge – and it can be if you aren’t “in the know” about some key terms used in the industry. Below are some top terms that will help you understand the big data conversation and come to a better understanding of how the terms contribute to the big [data] picture.
Top Big Data Terms to Understand the Concept
1. MapReduce—A programming paradigm that supports distributed computing on large datasets.
MapReduce is a programming model that was created (and is used) by Google in the early 2000s to process massive amounts of data. It’s name comes from the common functions in programming known as map and reduce, but they serve different functions than the traditional definitions. It is important to understand the concept because MapReduce technologies are responsible for decentralizing data storage and processing to increase the speed and reliability of dealing with large data sets. A popular free implementation is Apache Hadoop.
2. NoSQL Databases—Document-oriented databases using a key/value interface rather than SQL (does not use the relational database management system – RDBMS) to classify and organize; created to manage volumes of data that do not have a fixed schema. NoSQL gained popularity as major companies adopted the system due to an overload of data, which could not use the traditional RDBMS solutions. NoSQL databases provide quick, efficient performance because captured data is quickly stored using a single identifying key and therefore, can quickly store a lot of transactions.
3. Storage—The technologies that hold the distributed data. Data can be stored using data centers, which could include any number of various cloud technologies.
4. Servers—Technologies available for renting computing power on remote machines; a program to “serve” the requests of programs. In big data, servers offer support for data storage and management.
5. Processing—The action of extracting valuable information from large datasets. Processing allows the user to sort through the massive amounts of data and produce information for analysis. While processing, data can be sorted and grouped based on algorithms, but it’s important to understand the limitations and constraints without applying human thought evaluation.
6. Natural Language Processing—Extracting information from human-created text. This type of processing requires sorting through data that is created from humans and not necessarily from their “actions.” For instance, if you are analyzing Twitter data from the previous six month, you might be looking for keywords and sentiment, which would require National Language Processing.
7. Visualization—Viewing graphically represented meaningful data.
As data is collected, stored and then analyzed, it needs to be presented in a way that it can be understood and digested. Programs able to analyze big data can sometimes interpret the data and represent it in a visual display for easier consumption and/or to show results.
8. Acquisition—Techniques for cleaning up messy public data sources. As data is collected, it is not always in its purest form and/or usable. There are various sources that help take this data and turn it into something that can be processed.
9. Serialization—Converting data structure or object state into a format able to be stored.
Serialization occurs after the data is collected and when it is being processed. As the data gets sorted and pushed around between systems, it may need to be stored. During these steps, the data will require serialization and it will be based on the different languages and APIs.
10. CPU – Centralized processing unit; the hardware within a computer, which performs the basic operations of the system (comparable to the “brain”).
CPUs are mentioned in reference to crunching the data.
11. Hadoop – Open source implementation (Apache software product).
Hadoop was developed to enable applications to work with thousands of computational independent computers and petabytes of data.
12. Petabytes, exabytes, zettabytes – Units of information used to measure data amounts. Petabytes = one quadrillion (short scale) bytes, or 1024 terabytes
Exabytes = one quintillion bytes (short scale)
Zettabytes = one sextillion (one long scale trilliard) bytes
The Digitalist Magazine is your online destination for everything you need to know to lead your enterprise’s digital transformation.
Read the Digitalist Magazine and get the latest insights about the digital economy that you can capitalize on today.
About Jen Cohen Crompton
Jen Cohen Crompton is a SAP Blogging Correspondent reporting on big data, cloud computing, enterprise mobility, analytics, sports and tech, and anything else innovation-related. When she's not blogging, she can be caught marketing, using social media and/or presenting at conferences around the world. Disclosure: Jen is being compensated by SAP to produce a series of articles on the innovation topics covered on this site. The opinions reflected here are her own.
Does your business use customer personas? Before long, you’ll be able to get rid of them for good. It’s not because they haven’t worked, but years ago there was nothing else available. Soon, artificial intelligence is going to do everything for us.
Thanks to machine learning, computers will soon know your customers better than your customers know themselves (it’s actually a bit scary, when you think about it). Let’s take a closer look at some of the changes we can expect to see.
More efficient targeting of new customers
Customer personas have historically been based on demographics like age, sex, and marital status. Eventually, they moved onto behavioral personas, which definitely helped companies define their ideal customers more effectively.
But they don’t follow customers’ online behaviors in detail, something you’ll soon be able to do using software. Browsing habits and social media use are far more likely to tell you what people want, which can help you acquire new customers.
People are evolving constantly
If you offer a product that doesn’t fundamentally change much—like NBN Internet in Australia, for example—would you continue going after the same potential customers year after year? If so, you can bet that you’ll eventually lose out to the competition. Customer avatars should be changed on a regular basis, as people change, but that doesn’t always happen. Computers will do it without you even asking them to.
Machine learning is more cost-effective
One reason why humans don’t update personas is because doing the research is expensive. When machine learning takes over, the cost will drop to almost nothing, even when for large-scale products or services.
That will be a great help to companies involved in markets around the world, as machines can able to keep up with everything on a localized level in every city, regardless of geography.
Customer personas are basically guesswork
The most fantastic avatar in the world is basically just guesswork, so even if your target audience conversion rate improves, it’s likely because you happened to guess correctly.
Computers are much better at guesswork than humans are. In fact, in the medical industry, machine learning has reached the stage where computers can often diagnose cancer more accurately than doctors. Similarly, machine learning technology will be able to effectively guess the ideal customer for your products.
Employees will have more time
How will machine learning affect employees? It will free up more time for them to focus on boosting sales—a big plus for employers.
The job of the marketing team will also become much easier in the future. With machine learning giving team members more time to focus on specific tasks, they will be able hone and sharpen their skills—another win-win.
Only a matter of time
Old-school customer personas worked when TV adverts were a primary marketing tool. But these days most of us spend our time facing a screen of some sort, and money is made online.
The Digitalist Magazine is your online destination for everything you need to know to lead your enterprise’s digital transformation.
Read the Digitalist Magazine and get the latest insights about the digital economy that you can capitalize on today.
About Andre Smith
An Internet, Marketing and E-Commerce specialist with several years of experience in the industry. He has watched as the world of online business has grown and adapted to new technologies, and he has made it his mission to help keep businesses informed and up to date.
For Google, the relentless search for answers is more than search algorithms. It’s the core of every innovation, ranging from YouTube and Android to self-driving cars. And for its 60,000 employees scattered across 50 countries, this mission is bringing it closer and closer to artificial intelligence, including machine learning.
During his SAPPHIRE NOW session, “Empower Intelligent Enterprises with Machine Learning from Google and SAP,” Francisco Uribe, head of product at Google, shared a simple vision for machine learning and AI: an opportunity to shape new business processes and customer experiences made possible through automated access to relevant information – structured and unstructured.
From Francisco’s perspective, the cloud is the best vehicle for AI capabilities. “At Google, we fundamentally believe the ideal place for AI is the cloud. For this reason, we’re making huge investments in the cloud machine-learning platform that will emerge in the next few years as a powerful tool that any developer and customer can use as an onramp onto this field.”
Through the cloud, Google envisions the democratization of AI across three areas:
The computing requirements of machine-learning are immense. A model today could quickly post thousands of millions of parameters generated from billions of connections. In this case, training employees on these models and servicing them are not a small endeavor. Through a cloud engine tool, companies can harness the power of an open-source, deep-learning library. Meanwhile, the cloud allows training and servicing at scale and with a greater focus on performance.
Even if a company has the right computing capability, AI remains one of the most complex endeavors in computer science. A set of predefined AI models can help businesses solve common machine learning tasks without the need to acquire additional expertise. Use cases include understanding images and speech, translating text, and parsing natural language.
3. Talents and expertise
Like any technology, some training is needed to extend and reap the full potential of artificial intelligence. Google created an advanced solutions lab that can be used to train the workforce on AI – and the same curriculum and courses used to train its engineers are now available to its customers. This opportunity enabled businesses to understand the benefits of AI better and solve their toughest problems with machine learning.
By making artificial intelligence and machine learning technology available, every business can leverage an enterprise brain to connect information and derive insights from millions of transactions generated by various systems. The benefits of this approach are more than automating processes, easing decision making, and creating new efficiencies; they’re about uncovering global information trends that have never been identified before.
Last August, a woman arrived at a Reno, Nevada, hospital and told the attending doctors that she had recently returned from an extended trip to India, where she had broken her right thighbone two years ago. The woman, who was in her 70s, had subsequently developed an infection in her thigh and hip for which she was hospitalized in India several times. The Reno doctors recognized that the infection was serious—and the visit to India, where antibiotic-resistant bacteria runs rampant, raised red flags.
When none of the 14 antibiotics the physicians used to treat the woman worked, they sent a sample of the bacterium to the U.S. Centers for Disease Control (CDC) for testing. The CDC confirmed the doctors’ worst fears: the woman had a class of microbe called carbapenem-resistant Enterobacteriaceae (CRE). Carbapenems are a powerful class of antibiotics used as last-resort treatment for multidrug-resistant infections. The CDC further found that, in this patient’s case, the pathogen was impervious to all 26 antibiotics approved by the U.S. Food and Drug Administration (FDA).
In other words, there was no cure.
This is just the latest alarming development signaling the end of the road for antibiotics as we know them. In September, the woman died from septic shock, in which an infection takes over and shuts down the body’s systems, according to the CDC’s Morbidity and Mortality Weekly Report.
Other antibiotic options, had they been available, might have saved the Nevada woman. But the solution to the larger problem won’t be a new drug. It will have to be an entirely new approach to the diagnosis of infectious disease, to the use of antibiotics, and to the monitoring of antimicrobial resistance (AMR)—all enabled by new technology.
Keeping an Eye Out for Outbreaks
Like others who are leading the fight against AMR, Dr. Steven Solomon has no illusions about the difficulty of the challenge. “It is the single most complex problem in all of medicine and public health—far outpacing the complexity and the difficulty of any other problem that we face,” says Solomon, who is a global health consultant and former director of the CDC’s Office of Antimicrobial Resistance.
Solomon wants to take the battle against AMR beyond the laboratory. In his view, surveillance—tracking and analyzing various data on AMR—is critical, particularly given how quickly and widely it spreads. But surveillance efforts are currently fraught with shortcomings. The available data is fragmented and often not comparable. Hospitals fail to collect the representative samples necessary for surveillance analytics, collecting data only on those patients who experience resistance and not on those who get better. Laboratories use a wide variety of testing methods, and reporting is not always consistent or complete.
Surveillance can serve as an early warning system. But weaknesses in these systems have caused public health officials to consistently underestimate the impact of AMR in loss of lives and financial costs. That’s why improving surveillance must be a top priority, says Solomon, who previously served as chair of the U.S. Federal Interagency Task Force on AMR and has been tracking the advance of AMR since he joined the U.S. Public Health Service in 1981.
A Collaborative Diagnosis
Ineffective surveillance has also contributed to huge growth in the use of antibiotics when they aren’t warranted. Strong patient demand and financial incentives for prescribing physicians are blamed for antibiotics abuse in China. India has become the largest consumer of antibiotics on the planet, in part because they are prescribed or sold for diarrheal diseases and upper respiratory infections for which they have limited value. And many countries allow individuals to purchase antibiotics over the counter, exacerbating misuse and overuse.
In the United States, antibiotics are improperly prescribed 50% of the time, according to CDC estimates. One study of adult patients visiting U.S. doctors to treat respiratory problems found that more than two-thirds of antibiotics were prescribed for conditions that were not infections at all or for infections caused by viruses—for which an antibiotic would do nothing. That’s 27 million courses of antibiotics wasted a year—just for respiratory problems—in the United States alone.
And even in countries where there are national guidelines for prescribing antibiotics, those guidelines aren’t always followed. A study published in medical journal Family Practice showed that Swedish doctors, both those trained in Sweden and those trained abroad, inconsistently followed rules for prescribing antibiotics.
Solomon strongly believes that, worldwide, doctors need to expand their use of technology in their offices or at the bedside to guide them through a more rational approach to antibiotic use. Doctors have traditionally been reluctant to adopt digital technologies, but Solomon thinks that the AMR crisis could change that. New digital tools could help doctors and hospitals integrate guidelines for optimal antibiotic prescribing into their everyday treatment routines.
“Human-computer interactions are critical, as the amount of information available on antibiotic resistance far exceeds the ability of humans to process it,” says Solomon. “It offers the possibility of greatly enhancing the utility of computer-assisted physician order entry (CPOE), combined with clinical decision support.” Healthcare facilities could embed relevant information and protocols at the point of care, guiding the physician through diagnosis and prescription and, as a byproduct, facilitating the collection and reporting of antibiotic use.
Cincinnati Children’s Hospital’s antibiotic stewardship division has deployed a software program that gathers information from electronic medical records, order entries, computerized laboratory and pathology reports, and more. The system measures baseline antimicrobial use, dosing, duration, costs, and use patterns. It also analyzes bacteria and trends in their susceptibilities and helps with clinical decision making and prescription choices. The goal, says Dr. David Haslam, who heads the program, is to decrease the use of “big gun” super antibiotics in favor of more targeted treatment.
While this approach is not yet widespread, there is consensus that incorporating such clinical-decision support into electronic health records will help improve quality of care, contain costs, and reduce overtreatment in healthcare overall—not just in AMR. A 2013 randomized clinical trial finds that doctors who used decision-support tools were significantly less likely to order antibiotics than those in the control group and prescribed 50% fewer broad-spectrum antibiotics.
Putting mobile devices into doctors’ hands could also help them accept decision support, believes Solomon. Last summer, Scotland’s National Health Service developed an antimicrobial companion app to give practitioners nationwide mobile access to clinical guidance, as well as an audit tool to support boards in gathering data for local and national use.
“The immediacy and the consistency of the input to physicians at the time of ordering antibiotics may significantly help address the problem of overprescribing in ways that less-immediate interventions have failed to do,” Solomon says. In addition, handheld devices with so-called lab-on-a-chip technology could be used to test clinical specimens at the bedside and transmit the data across cellular or satellite networks in areas where infrastructure is more limited.
Artificial intelligence (AI) and machine learning can also become invaluable technology collaborators to help doctors more precisely diagnose and treat infection. In such a system, “the physician and the AI program are really ‘co-prescribing,’” says Solomon. “The AI can handle so much more information than the physician and make recommendations that can incorporate more input on the type of infection, the patient’s physiologic status and history, and resistance patterns of recent isolates in that ward, in that hospital, and in the community.”
Speed Is Everything
Growing bacteria in a dish has never appealed to Dr. James Davis, a computational biologist with joint appointments at Argonne National Laboratory and the University of Chicago Computation Institute. The first of a growing breed of computational biologists, Davis chose a PhD advisor in 2004 who was steeped in bioinformatics technology “because you could see that things were starting to change,” he says. He was one of the first in his microbiology department to submit a completely “dry” dissertation—that is, one that was all digital with nothing grown in a lab.
Upon graduation, Davis wanted to see if it was possible to predict whether an organism would be susceptible or resistant to a given antibiotic, leading him to explore the potential of machine learning to predict AMR.
As the availability of cheap computing power has gone up and the cost of genome sequencing has gone down, it has become possible to sequence a pathogen sample in order to detect its AMR resistance mechanisms. This could allow doctors to identify the nature of an infection in minutes instead of hours or days, says Davis.
Davis is part of a team creating a giant database of bacterial genomes with AMR metadata for the Pathosystems Resource Integration Center (PATRIC), funded by the U.S. National Institute of Allergy and Infectious Diseases to collect data on priority pathogens, such as tuberculosis and gonorrhea.
Because the current inability to identify microbes quickly is one of the biggest roadblocks to making an accurate diagnosis, the team’s work is critically important. The standard method for identifying drug resistance is to take a sample from a wound, blood, or urine and expose the resident bacteria to various antibiotics. If the bacterial colony continues to divide and thrive despite the presence of a normally effective drug, it indicates resistance. The process typically takes between 16 and 20 hours, itself an inordinate amount of time in matters of life and death. For certain strains of antibiotic-resistant tuberculosis, though, such testing can take a week. While physicians are waiting for test results, they often prescribe broad-spectrum antibiotics or make a best guess about what drug will work based on their knowledge of what’s happening in their hospital, “and in the meantime, you either get better,” says Davis, “or you don’t.”
At PATRIC, researchers are using machine-learning classifiers to identify regions of the genome involved in antibiotic resistance that could form the foundation for a “laboratory free” process for predicting resistance. Being able to identify the genetic mechanisms of AMR and predict the behavior of bacterial pathogens without petri dishes could inform clinical decision making and improve reaction time. Thus far, the researchers have developed machine-learning classifiers for identifying antibiotic resistance in Acinetobacter baumannii (a big player in hospital-acquired infection), methicillin-resistant Staphylococcus aureus (a.k.a. MRSA, a worldwide problem), and Streptococcus pneumoniae (a leading cause of bacterial meningitis), with accuracies ranging from 88% to 99%.
Houston Methodist Hospital, which uses the PATRIC database, is researching multidrug-resistant bacteria, specifically MRSA. Not only does resistance increase the cost of care, but people with MRSA are 64% more likely to die than people with a nonresistant form of the infection, according to WHO. Houston Methodist is investigating the molecular genetic causes of drug resistance in MRSA in order to identify new treatment approaches and help develop novel antimicrobial agents.
The Hunt for a New Class of Antibiotics
There are antibiotic-resistant bacteria, and then there’s Clostridium difficile—a.k.a. C. difficile—a bacterium that attacks the intestines even in young and healthy patients in hospitals after the use of antibiotics.
It is because of C. difficile that Dr. L. Clifford McDonald jumped into the AMR fight. The epidemiologist was finishing his work analyzing the spread of SARS in Toronto hospitals in 2004 when he turned his attention to C. difficile, convinced that the bacteria would become more common and more deadly. He was right, and today he’s at the forefront of treating the infection and preventing the spread of AMR as senior advisor for science and integrity in the CDC’s Division of Healthcare Quality Promotion. “[AMR] is an area that we’re funding heavily…insofar as the CDC budget can fund anything heavily,” says McDonald, whose group has awarded $14 million in contracts for innovative anti-AMR approaches.
Developing new antibiotics is a major part of the AMR battle. The majority of new antibiotics developed in recent years have been variations of existing drug classes. It’s been three decades since the last new class of antibiotics was introduced. Less than 5% of venture capital in pharmaceutical R&D is focused on antimicrobial development. A 2008 study found that less than 10% of the 167 antibiotics in development at the time had a new “mechanism of action” to deal with multidrug resistance. “The low-hanging fruit [of antibiotic development] has been picked,” noted a WHO report.
Researchers will have to dig much deeper to develop novel medicines. Machine learning could help drug developers sort through much larger data sets and go about the capital-intensive drug development process in a more prescriptive fashion, synthesizing those molecules most likely to have an impact.
McDonald believes that it will become easier to find new antibiotics if we gain a better understanding of the communities of bacteria living in each of us—as many as 1,000 different types of microbes live in our intestines, for example. Disruption to those microbial communities—our “microbiome”—can herald AMR. McDonald says that Big Data and machine learning will be needed to unlock our microbiomes, and that’s where much of the medical community’s investment is going.
He predicts that within five years, hospitals will take fecal samples or skin swabs and sequence the microorganisms in them as a kind of pulse check on antibiotic resistance. “Just doing the bioinformatics to sort out what’s there and the types of antibiotic resistance that might be in that microbiome is a Big Data challenge,” McDonald says. “The only way to make sense of it, going forward, will be advanced analytic techniques, which will no doubt include machine learning.”
Reducing Resistance on the Farm
Bringing information closer to where it’s needed could also help reduce agriculture’s contribution to the antibiotic resistance problem. Antibiotics are widely given to livestock to promote growth or prevent disease. In the United States, more kilograms of antibiotics are administered to animals than to people, according to data from the FDA.
One company has developed a rapid, on-farm diagnostics tool to provide livestock producers with more accurate disease detection to make more informed management and treatment decisions, which it says has demonstrated a 47% to 59% reduction in antibiotic usage. Such systems, combined with pressure or regulations to reduce antibiotic use in meat production, could also help turn the AMR tide.
Breaking Down Data Silos Is the First Step
Adding to the complexity of the fight against AMR is the structure and culture of the global healthcare system itself. Historically, healthcare has been a siloed industry, notorious for its scattered approach focused on transactions rather than healthy outcomes or the true value of treatment. There’s no definitive data on the impact of AMR worldwide; the best we can do is infer estimates from the information that does exist.
The biggest issue is the availability of good data to share through mobile solutions, to drive HCI clinical-decision support tools, and to feed supercomputers and machine-learning platforms. “We have a fragmented healthcare delivery system and therefore we have fragmented information. Getting these sources of data all into one place and then enabling them all to talk to each other has been problematic,” McDonald says.
Collecting, integrating, and sharing AMR-related data on a national and ultimately global scale will be necessary to better understand the issue. HCI and mobile tools can help doctors, hospitals, and public health authorities collect more information while advanced analytics, machine learning, and in-memory computing can enable them to analyze that data in close to real time. As a result, we’ll better understand patterns of resistance from the bedside to the community and up to national and international levels, says Solomon. The good news is that new technology capabilities like AI and new potential streams of data are coming online as an era of data sharing in healthcare is beginning to dawn, adds McDonald.
The ideal goal is a digitally enabled virtuous cycle of information and treatment that could save millions of dollars, lives, and perhaps even civilization if we can get there. D!
Despite the progress made in some countries, I am also aware of others that are still resistant to digitizing their economy and automating operations. What’s the difference between firms that are digital leaders and those that are slow to mature? From my perspective in working with a variety of businesses throughout Europe, it’s a combination of diversity and technology availability.
European companies are hardly homogenous. Comprising 47 countries across the continent, they serve communities that speak any of 225 spoken languages. Each one is experiencing various stages of digital development, economic stability, and workforce needs.
Nevertheless, as a whole, European firms do prioritize customer acquisition as well as improving efficiency and reducing costs. Over one-third of small and midsize companies are investing in collaboration software, customer relationship management solutions, e-commerce platforms, analytics, and talent management applications. Steadily, business leaders are finding better ways to go beyond data collection by applying predictive analytics to gain real-time insight from predictive analytics and machine learning to automate processes where possible.
Small and midsize businesses have a distinct advantage in this area over their larger rivals because they can, by nature, adopt new technology and practices quickly and act on decisions with greater agility. Nearly two-thirds (64%) of European firms are embracing the early stages of digitalization and planning to mature over time. Yet, the level of adoption depends solely on the leadership team’s commitment.
For many small and midsize companies across this region, the path to digital maturity resides in the cloud, more so than on-premise software deployment. For example, the flexibility associated with cloud deployment is viewed as a top attribute, especially among U.K. firms. This brings us back to the diversity of our region. Some countries prioritize personal data security while others may be more concerned with the ability to access the information they need in even the most remote of areas.
Technology alone does not deliver digital transformation
Digital transformation is certainly worth the effort for European firms. Between 60%–90% of small and midsize European businesses say their technology investments have met or exceeded their expectations – indicative of the steady, powerhouse transitions enabled by cloud computing. Companies are now getting the same access to the latest technology, data storage, and IT resources.
However, it is also important to note that a cloud platform is only as effective as the long-term digital strategy that it enables. To invigorate transformative changes, leadership needs to go beyond technology and adopt a mindset that embraces new ideas, tests the fitness of business models and processes continuously, and allows the flexibility to evolve the company as quickly as market dynamics change. By taking a step back and integrating digital objectives throughout the business strategy, leadership can pull together the elements needed to turn technology investments into differentiating, sustainable change. For example, the best talent with the right skills is hired. Plus, partners and suppliers with a complementary or shared digital vision and capability are onboarded.
The IDC Infobrief confirms what I have known all along: Small and midsize businesses are beginning to digitally mature and maintain a strategy that is relevant to their end-to-end processes. And furthering their digital transformation go hand in hand with the firms’ ability to ignite a transformational force that will likely progress Europe’s culture, social structure, and economy.