Federated Learning (FL) trains a shared model across many devices or organisations — each keeping their data local. Google uses it to improve Gboard next-word prediction without reading anyone's messages. Hospitals use it to train diagnostic models across institutions without sharing patient records.

The FedAvg Algorithm

The core algorithm is simple. In each round:

  1. Server sends current global model weights to a subset of clients.
  2. Each client trains on its local data for E local epochs.
  3. Clients send their weight updates (gradients or deltas) back to server.
  4. Server aggregates updates — typically a weighted average by dataset size.
  5. Repeat until convergence.
Python (FedAvg Server)
import numpy as np def federated_average(client_updates: list, client_sizes: list) -> np.ndarray: """Weighted average of client model updates.""" total = sum(client_sizes) aggregated = np.zeros_like(client_updates[0]) for update, size in zip(client_updates, client_sizes): aggregated += (size / total) * update return aggregated # Server round global_weights = initialise_model() for round_num in range(100): selected = select_clients(fraction=0.1) updates = [client.train(global_weights, epochs=5) for client in selected] sizes = [client.dataset_size for client in selected] global_weights = federated_average(updates, sizes) print(f"Round {round_num}: global model updated")

Real-World Challenges

📊
Non-IID Data
Client data distributions differ wildly. FedProx and SCAFFOLD algorithms handle heterogeneous data better than vanilla FedAvg.
📡
Communication Cost
Sending full model weights each round is expensive. Gradient compression and model quantisation reduce bandwidth by 100x.
🔐
Gradient Leakage
Gradients can leak training data. Combine FL with Differential Privacy or Secure Aggregation for true privacy.
📴
Stragglers
Slow or offline clients block synchronous rounds. Asynchronous FL or client selection strategies mitigate this.
🏥

Healthcare Use Case

NVIDIA FLARE enables hospitals to collaboratively train tumour segmentation models. No patient data leaves the institution. The joint model outperforms any single-institution model by 15–20%.