The Many Faces of Compression: Theory and Practice in Federated Optimization Elnur Gasanov, Ph.D. Student, Computer Science May 28, 10:00 - 12:00 B3, L5, R5209 compression methods communication efficiency proximal algorithms This thesis presents novel compression methods and optimization algorithms aimed at improving communication efficiency in large-scale machine learning systems.
Error Feedback for Communication-Efficient First and Second-Order Distributed Optimization: Theory and Practical Implementation Konstantin Burlachenko, Ph.D. Student, Computer Science May 12, 12:00 - 13:00 B9 L2 R2325 Federated learning software development This seminar will discuss advancements in Federated Learning, including theoretical improvements to the Error Feedback method (EF21) for communication-efficient distributed training and the development of significantly more practical and efficient implementations of the Federated Newton Learn (FedNL) algorithm.
Optimization Methods and Software for Federated Learning Konstantin Burlachenko, Ph.D. Student, Computer Science May 8, 19:00 - 21:00 B5 L5 R5209 This dissertation identifies five key challenges in Federated Learning (FL), including data and device heterogeneity, communication issues, privacy concerns, and software implementations. More broadly, our work serves as a guide for researchers navigating the complexities of translating theoretical methods into efficient real-world implementations, while also offering insights into the reverse process of adapting practical implementation aspects back into theoretical algorithm design.
Stein Variational Gradient Descent and Consensus-Based Optimization: Towards a Convergence Analysis and Generalization Lukang Sun, Ph.D. Student, Computer Science May 30, 11:00 - 14:00 B3 L5 R5220 The first part of the dissertation presents a study on the convergence properties of Stein Variational Gradient Descent (SVGD), a sampling algorithm with applications in machine learning. The research delves into the theoretical analysis of SVGD in the population limit, focusing on its behavior under various conditions, including the Talagrand’s inequality T1 and the (L0, L1)−smoothness condition. The study also introduces an improved version of SVGD with importance weights, demonstrating its potential to accelerate convergence and enhance stability.
Better Methods and Theory for Federated Learning: Compression, Client Selection, and Heterogeneity Samuel Horváth, Ph.D. Student, Statistics Jun 27, 18:00 - 20:00 B5 L5 R5209 Federated learning Optimization for Machine Learning Federated learning (FL) is an emerging machine learning paradigm involving multiple clients, e.g., mobile phone devices, with an incentive to collaborate in solving a machine learning problem coordinated by a central server. FL was proposed in 2016 by Konecny et al. and McMahan et al. as a viable privacy-preserving alternative to traditional centralized machine learning since, by construction, the training data points are decentralized and never transferred by the clients to a central server. Therefore, to a certain degree, FL mitigates the privacy risks associated with centralized data collection. Unfortunately, optimization for FL faces several specific issues that centralized optimization usually does not need to handle. In this thesis, we identify several of these challenges and propose new methods and algorithms to address them, with the ultimate goal of enabling practical FL solutions supported with mathematically rigorous guarantees.
Optimization for Supervised Machine Learning: Randomized Algorithms for Data and Parameters Filip Hanzely, Ph.D., Applied Mathematics and Computational Sciences Jul 30, 16:00 - 18:00 KAUST Many key problems in machine learning and data science are routinely modeled as optimization problems and solved via optimization algorithms. With the increase of the volume of data and the size and complexity of the statistical models used to formulate these often ill-conditioned optimization tasks, there is a need for new efficient algorithms able to cope with these challenges. In this thesis, we deal with each of these sources of difficulty in a different way. To efficiently address the big data issue, we develop new methods which in each iteration examine a small random subset of the training data only. To handle the big model issue, we develop methods which in each iteration update a random subset of the model parameters only. Finally, to deal with ill-conditioned problems, we devise methods that incorporate either higher-order information or Nesterov's acceleration/momentum. In all cases, randomness is viewed as a powerful algorithmic tool that we tune, both in theory and in experiments, to achieve the best results.
Proximal Splitting Methods for Convex Optimization: An Introduction Laurent Condat, Senior Research Scientist, Computer, Electrical and Mathematical Sciences and Engineering Dec 2, 12:00 - 13:00 B9 L2 H1 R2322 This talk will be a gentle introduction to proximal splitting algorithms to minimize a sum of possibly nonsmooth convex functions. Several such algorithms date back to the 60s, but the last 10 years have seen the development of new primal-dual splitting algorithms, motivated by the need to solve large-scale problems in signal and image processing, machine learning, and more generally data science. No background will be necessary to attend the talk, whose goal is to present the intuitions behind this class of methods.
Laurent Condat, Senior Research Scientist, Computer, Electrical and Mathematical Sciences and Engineering
Langevin Monte Carlo as an optimization algorithm Adil Salim, Postdoctoral Research Fellow, Computer Science Nov 11, 12:00 - 13:00 B9 L2 H1 R2322 machine learning Langevin Monte Carlo optimization Adil Salim is mainly interested in stochastic approximation, optimization, and machine learning. He is currently a Postdoctoral Research Fellow working with Professor Peter Richtarik at the Visual Computing Center (VCC) at King Abdullah University of Science and Technology (KAUST).