Is Your Machine-Learning Implementation Debt-Free?

Paul Pallath

Top 5 Tips to Find The Best Debt Settlement CompanyDebt of any kind—if not addressed—is a time bomb waiting to explode. We can easily relate to this with reference to finance.

The comparison between technical complexity and debt was first drawn in 1992. In an experience report, Ward Cunningham alerted the industry to the problem and in doing so, coined the term “technical debt.”

“Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite… The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.”— Ward Cunningham, 1992

Do machine-learning models experience technical debt too?

A recent paper,“Hidden Technical Debt in Machine Learning Systems,” from Sculley et al (2015) suggests so. It explains that machine-earning systems induce hidden technical debts in addition to the technical debt that is introduced during software development.

There is a crucial difference between hidden and technical debt. Technical debt can be addressed by refactoring code, removing dead code, reducing dependencies, introducing abstractions for easy maintainability, and so on. However, hidden debt is dangerous because it compounds silently.

The following are broad categories under which hidden debt has been identified in machine-learning implementations:

  • Boundary erosion
  • Data dependencies
  • Anti-patterns
  • Impact of dealing with changes in the real world

Boundary erosion

The practices of encapsulation and modular design in software engineering create strong abstraction boundaries to help maintain code. Therefore, code can be easily extended for enhancements without the need to modify the existing code.

Unfortunately, it’s difficult to enforce abstraction boundaries for machine learning systems by defining a specific intended behaviour. This is due to entanglement, correction cascades, and undeclared consumers.

Data dependencies

According to Morgenthaler et al in their paper on managing technical debt at Google (2012), dependency debt is an important factor contributing to code complexity and technical debt in software development. Thankfully, modern day compilers and linkers are able to detect and help fix such dependencies.

Data dependencies have a similar impact in machine-learning systems but are very difficult to detect. Unstable data dependencies, underutilized data dependencies, and static analysis of data dependencies are some of the data-related reasons why hidden debt is created in machine learning systems.

Anti-patterns

Code that is dedicated to training a model and prediction in machine learning is significantly smaller than various other types of code. Take, for example, glue code (where several otherwise incompatible components are quickly put together into a single implementation), or dead experimental code paths (where code is written for rapid prototyping to gain quick turnaround times in machine learning implementations).

In summary

Models created using machine-learning algorithms are consumed in business applications that interact directly with the real world. It follows that the unstable nature of the real world is another reason why hidden debt is induced in machine-learning systems.

Such situations in machine-learning implementations warrant a trusted partner like SAP. At SAP, we provide the capabilities to help prevent the non-obvious, hidden debt that is created unintentionally on a predictive journey. In this way, a data-driven organization can embark unencumbered on the path of digital transformation.

For more on using data effectively in digital transformation, see Finding Value In Data.


Paul Pallath

About Paul Pallath

Dr Paul Pallath is the Chief Data Scientist & Senior Director with the Advanced Analytics Organisation at SAP. With over 20 years of experience in Machine Learning, Paul has several research publications in the field of Machine Learning & Data Mining in International Journals and conferences and has also invented several patentable ideas.  He has a Master’s Degree in Computer Applications with Gold Medal, and PhD in Machine Learning, both from Indian Institute of Technology, Delhi.