Summary
Federated learning (FL), also known as collaborative learning, is a machine learning technique that trains an algorithm across multiple distributed devices or servers holding local data samples without exchanging them. This contrasts with centralized machine learning, where local datasets are uploaded to a server, and decentralized approaches that often assume local data samples are identically distributed. FL allows multiple actors to build a shared machine learning model without sharing data, addressing issues like data privacy, security, access rights, and access to heterogeneous data.
Viability
First introduced by Google in 2017, FL is currently between R&D and commercialization. Developer tools are being developed to make it easier, faster, and more secure. Newer techniques like decentralized FL and heterogeneous FL (HeteroFL) are designed to avoid a single point of failure and account for heterogeneously distributed data. Federated learning with Differential Privacy (FL-DP) adds noise to the model to reduce the risk of memorizing user data. It is already used in production systems at Google, Apple, and many other large tech companies. These companies have data systems built with local devices in mind, making FL deployment easier than for an average Fortune 500 company. FL is still an active area of R&D, and new research is expected to yield FL products in the next few years with stronger privacy guarantees and easier deployment. It is hard to see FL growing significantly in enterprises soon due to the time required for distributed system thinking to permeate the economy, so experimentation is expected to occur in Web3 first.
Drivers
At a macro level, FL is driven by several technological challenges: the need for more processing, especially ML on edge devices; the need to combine device learning and server learning; and data minimization requirements. Instead of collecting all data in one place, there are data security, performance, and regulatory needs to perform learning on data where it resides. Google's ill-fated use of the technique in its FLoC cookie alternative brought FL into the mainstream in 2021 and increased awareness.
Novelty
Compared to traditional ML techniques, FL has novel features. Depending on the setup, FL can have less communication and computational overhead as only the model, not the dataset, is sent over a network. This is valuable as it enables continuous local learning, a feature not feasible with traditional ML. However, on the privacy side, FL is weaker than hardware and software-based cryptography, especially in integrity and confidentiality, and hardware-based solutions are likely to be more cost-effective.
Diffusion
Like other distributed or decentralized solutions, FL requires a change in thinking and processes. FL needs different workflows and toolkits, although many popular ML frameworks like TensorFlow already offer solutions. The core challenge is the lack of standardization around data. Despite work on HeteroFL, local datasets need to be compatible. Cleaning a single dataset for training is hard enough, and FL requires engineers to think about multiple, ideally thousands, of nodes. However, this change in thinking and processes is relevant for incorporating edge devices into the data stack, making it easier for FL to be deployed in companies already moving towards a data mesh.
Impact (3)
Over time, the costs of compute at the edge will fall faster than communication costs. FL is inevitable from an energy and cost perspective compared to centralized ML. It requires changes in machine learning toolkits and processes, but the benefits will outweigh the costs. A high-impact scenario sees more computing pushed to the edge, requiring distributed architectures to meet this need. With more applications running at the edge, FL enables continuous learning and more accurate global models. This model of the world also sees huge public models generated as nodes are incentivized to share learning and operated by Decentralized Autonomous Organizations (DAOs).
Sources