Streamlining The Factory Floor With Failure Mode Analytics

Anubhav Bhatia

Learning from failures is an important part of moving forward in any industry, job, or project. It’s no different in the manufacturing environment, where various equipment failures can provide invaluable operational insight to streamline operations and improve safety.

Manufacturing organizations that manage assets may have years of data that is collected from their machines or equipment. Buried within the years of data are failure-related notifications involving events associated with the asset (maintenance performed, error code displays, customer complaints, part replacements, work orders, and so on).  Unfortunately, this valuable information is often hidden within millions of lines of text — for example, within free-form text within the notifications and work orders. This makes it difficult, if not impossible, to analyze how often a particular failure occurred in the past, or if there is equipment in the fleet that perhaps has a higher occurrence of failure than others.

So, while the organization has a proliferation of data on asset performance, without the proper tools in place it cannot derive value and improve business operations.

Getting started

Failure mode analytics is a system used to help organizations derive value from existing notification data, which was previously not possible. In addition, it can analyze historical notification texts and assign the most likely failure mode in a fraction of the time it would take a human to do so manually. The technology allows experts to validate high-quality failure mode analysis and feed the results back into the machine learning engine. This way, past knowledge can inform future failure node models. The resulting model enables a proactive maintenance approach that avoids unplanned downtime via timely replacement or maintenance of equipment that is near failure.

How it works

Failure mode analytics can create actionable knowledge from historical data by providing a mechanism enabling a user to build a machine-learning model for failure mode analysis based on historical data. The user can then train the model, validate the model, and score the model to determine its effectiveness.

During an unsupervised learning stage, the system trains a topic model (such as a newly created model or update of an existing model) for failure mode analysis based on the historical asset data. It can then use machine learning to extract topics from the historical data based on textual information in the existing notifications and match the topics to predefined failure modes for the asset (asset-specific failure modes, for example). Then, the unsupervised model for failure mode analysis can be stored in a database for ongoing access.

Further, a streamlined user interface allows users to validate the unsupervised model for failure mode analysis and make modifications, if necessary. For example, a subject matter expert or reliability engineer can double-check the machine learning matches between topics and predefined failure modes, and reassign a topic to a different failure mode. The interface also provides information about each topic such as an identification of top keywords of a specific topic.

During a second stage, the system performs supervised learning on the validated model, otherwise referred to as ensemble learning. During this stage, the system uses the model to predict failure modes associated with notifications, creating mappings on raw data and providing insights into the model’s quality and effectiveness through various metrics (top failure modes, KPIs, etc.). Once the user has achieved a desired result with the model during the supervised learning, the text classification model can be stored and/or provided to a system such as a condition-based monitoring platform for monitoring assets.

The finished model for failure mode analytics receives a notification and identifies which failure mode the notification belongs to, and automatically assigns the best suitable failure mode accordingly. With the help of these assignments, the system can calculate indicators such as MTTF (mean time to failure), MTTR (mean time to repair), and MTBF (mean time between failures). Further, this technology provides the end user with additional details about the failures. This might include how often which failure mode appeared in notifications for the equipment model. It will display if a failure mode is detected more than average compared to all equipment of that model.

Business benefits

Failure mode analytics allows companies to get to the root cause of various malfunctions and aids in swift issue remediation. Further, having full visibility into product performance can help inform future design efforts, ultimately creating more seamless factory operations. Being able to pinpoint equipment failures will inevitably help organizations can get ahead of potential setbacks, therefore reducing the chance of downtime and the steep costs associated with it.

This article originally appeared in Engineering.com and is republished by permission.    


About Anubhav Bhatia

Anubhav Bhatia is currently vice president, Engineering, at SAP Labs, Palo Alto, California. He leads efforts in area of intelligent asset management especially in area of predictive maintenance and services using AI. Anubhav has more than decade of experience in engineering, architecture, platform development, and senior management roles. He has proven expertise in innovations, application development, and business process orchestration in enterprise systems. Anubhav is also an official member of Forbes Technical Council, IEEE Senior member, ASUG Member, and speaker at many SAP and ASUG events.