Barbara Caputo
1 3
|
|
1DAUIN, PoliTO, Italy
|
2IIT, Genoa, Italy
|
3CINI, Rome, Italy
|
|
@article{fani2024accelerating, title={Accelerating Heterogeneous Federated Learning with Closed-form Classifiers}, author={Fanì, Eros and Camoriano, Raffaello and Caputo, Barbara and Ciccone, Marco}, journal={Proceedings of the International Conference on Machine Learning}, year={2024} }
In FL, clients’ data reveal individual user habits, preferences, and locations, which challenges the traditional assumption in machine learning that all data points are independent and identically distributed. This is known in FL as Statistical Heterogeneity. During training, statistical heterogeneity causes local updates to diverge from the global optimum. Consequently, the overall speed of convergence can be significantly reduced.
Recent works show that, in neural networks, client drift primarily affects the classifier. In real-world cross-device scenarios, clients have access to different classes. Suppose client is the only one with access to the class “dog”. Because of partial participation, the same client is typically not sampled in two consecutive rounds. If client is sampled in one round but not in the next round, the model can develop data recency bias, forgetting the knowledge about the class “dog”. This phenomenon is well-studied in areas such as Continual Learning. In classification, it occurs because the softmax classifier is prone to forgetting when updated with data in a non-i.i.d. or class-imbalanced manner.
Therefore, in this work, we aim to answer the following question: Is it possible to design an efficient FL method that is robust to client drift in heterogeneous settings and unaffected by classifier bias? Luckily, the answer is yes, exploiting the properties of closed-form linear classifiers.
Indeed, we propose a new robust and efficient algorithm for federated learning based on Ridge Regression (RR), that we named Federated Recursive Ridge Regression (Fed3R). Thanks to the linearity of its formulation, Fed3R is immune to statistical heterogeneity, guarantees faster convergence than the baselines, and severely reduces computations and communication costs.
In addition, we propose two additional variants of Fed3R: Fed3R with Random Features (Fed3R-RF), a non-linear version of the algorithm based on random Fourier features mapping to approximate the Kernel Ridge Regression solution while keeping the same properties of Fed3R, and Fed3R with Fine-Tuning (Fed3R+FT), which allows fine-tuning the whole model, the feature extractor only, or the classifier only, after initializing the model with the Fed3R classifier.
[Back to top]Each client asynchronously computes local statistics using its local dataset and a pre-trained feature extractor .
The server collects these statistics and aggregates them to form the matrices and , which are used to compute the optimal regularized least squares classifier. The aggregation guarantees an exact solution equivalent to the centralized solution.
As pre-trained feature extractors may not be expressive enough to separate features for complex learning problems linearly, we also introduce Fed3R-RF, which first performs a nonlinear random features mapping of the latent feature space to a new higher-dimensional feature space by approximating the corresponding kernel feature map.
Fed3R performance relies on the quality of the pre-trained feature extractor, which is frozen. Therefore, we propose Fed3R+FT, where a fine-tuning stage follows the classifier initialization. First, Fed3R+FT learns a Fed3R classifier using a pretrained feature extractor. Then, it initializes a softmax classifier using the parameters of the Fed3R classifier. Finally, the whole model is fine-tuned using a traditional FL algorithm. As the Fed3R classifier is the optimal Regularized Least Squares classifier obtained using the pre-trained feature extractor, it provides a stable starting point that can mitigate client drift and destructive interference during aggregation.
We propose three different fine-tuning strategies for Fed3R+FT:
In our experiments, we mainly consider cross-device FL scenarios with thousands of clients and high statistical heterogeneity.
Dataset | Split | Samples per client (avg) | K | C |
---|---|---|---|---|
Landmarks | Users-160K | 119.9 | 1262 | 2028 |
iNaturalist | Users-120K | 13.0 | 9275 | 1203 |
iNaturalist | Geo-100 | 33.4 | 3606 | 1203 |
iNaturalist | Geo-300 | 99.6 | 1208 | 1203 |
iNaturalist | Geo-1K | 326.9 | 368 | 1203 |
Cifar100 | α=0 | 500.0 | 100 | 100 |
Fine-tuning the entire model shows benefits on Landmarks, which is more similar to cross-silo FL than iNaturalist. In federated settings with more clients, such as in iNaturalist, there is a significant negative impact during the aggregation phase for the Fed3R+FT and Fed3R+FT LP experiments, as the classifier is fine-tuned and becomes susceptible to the classifier bias phenomenon. Conversely, keeping the classifier fixed and only fine-tuning the feature extractor as in the Fed3R+FTfeat experiments prevents classifier data recency bias and destructive interference during the aggregation, ensuring performance improvement and clearly indicating that the pre-trained features were not sufficiently enough for the target task.
In this work, we introduce Fed3R, a family of FL algorithms based on Recursive RR.
Future works may extend FED3R to streaming data or personalized learning scenarios within the FL framework.
This study was carried out within the FAIR - Future Artificial Intelligence Research and received funding from the European Union Next-GenerationEU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR) – MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.3 – D.D. 1555 11/10/2022, PE00000013). This manuscript reflects only the authors’ views and opinions, neither the European Union nor the European Commission can be considered responsible for them. We acknowledge the CINECA award under the ISCRA initiative for the availability of high-performance computing resources and support. We also thank the reviewers and area chair for their valuable comments.