Machine Learning And Data Privacy: Contradiction Or Partnership?

Ivona Crnoja

The battle for future markets and bigger market shares is in full swing. The world’s most influential companies are in a steady race to develop better automated systems and, in turn, boost artificial intelligence technology – taking them ahead of their competitors. By 2020, AI is expected to turn over more than 21 billion euros worldwide. However, further development of machine learning and artificial intelligence technologies seems to be blocked by a major obstacle: data privacy.

AI and machine learning systems automate repetitive tasks through the input of massive amounts of data. The more data is consumed, the better these computer algorithms can recognize and capture patterns in the data. Hence, AI needs excessive amounts of data to find structure and regularities, learn directly, and predict the next step. The more data you feed the algorithms, the more accurate they become. Until now, however, it’s been unclear how algorithms evolve and whether and how they interconnect, possibly processing customer data further than intended.

Data protection, on the other hand, is based on the minimization, transparency, and deletion of customer data. Big Data use strongly conflicts with the idea of data privacy. How can the advantages of Big Data technologies be used by companies without revealing sensitive customer data and violating Europe’s General Data Protection Regulation (GDPR), for example?

Federated learning, decentralization of data

Thinking of how machines learn from data, one of the crucial technical necessities is the centralization of data. The strategy is basically to build a model based on a given set of data in a closed space, such as the cloud or a data center, where data can be gathered, used, and controlled but does not leave this defined space or platform. By centralizing data, however, the question becomes: which space or platform will ensure that data is saved and used for the right purpose without falling into the wrong hands?

This is where federated learning comes into play. In order to solve security issues caused by centralization of data (and therefore tying data to a fixed space or platform), federated learning secures data by decentralizing it. Developers would receive anonymized customer data, but no specific data that could be traced to a particular user. Thus, rather than storing and analyzing data in a centralized and possibly insecure space, data is used locally on the user’s device or server and only the learning outcome, not the data itself, is transferred and centralized. This type of machine learning enables phones or computers to do predictive model training while keeping the entire training data on the device, thereby diminishing the need to transfer and store the data in the cloud, a data center, or a central server.

Federated learning still enables machines to learn from vast amounts of data without centralizing the data or risking revealing and tracing sensitive and private customer information. Thus it promises higher user anonymity since user data does not have to be processed by the machine first.

Why it matters

Digital business pioneers have developed a very specific and research-driven corporate culture based on fully data-supported decision-making and management. With digital transformation, the amount and variety of data (and thus its importance in running more profitable and efficient businesses) are increasing immensely. Companies aiming to increase market share are dependent on customer data and its efficient and safe use.

Nevertheless, the disadvantages implied by the storage and usage of centralized data have already caused data scandals. These have cast a bad light on numerous influential companies and global corporations, substantially hampering their brand reputation and destroying market share. In addition to data privacy breach scandals, sophisticated cyber attacks have caused widespread fear about data loss and exposure. Data security can make or break businesses, but federated learning with decentralized data can be one approach to effectively increase a company’s profitability using machine learning technologies while still ensuring secure usage of customer data.

Learn more about:


Ivona Crnoja

About Ivona Crnoja

Ivona drives Marketing and Communications for the machine learning research team at SAP in Berlin. Together they work on bridging the gap between academia and industry by solving tough machine learning challenges and developing new approaches with potential applications across various industries. They aim to advance knowledge in the field of AI while identifying applications for machine learning and developing algorithms and systems that make SAP solutions more efficient, scalable, and transparent. Ivona dedicates her time to exploring the latest machine learning challenges and trends, while writing articles to share the team's research achievements with the world.