Workshop on Functional Inference and Machine Intelligence
Okinawa, Japan, March 1st-2nd, 2025.
Home
The Workshop on Functional Inference and Machine Intelligence (FIMI) is an international workshop on machine learning and statistics, with a particular focus on theory, methods, and practice. It consists of invited talks, and poster sessions. The topics include (but not limited to):
Masaaki Imaizumi (The University of Tokyo)
Title: Learning with Dynamics: Neural Network and High-Dimensional Inference
We introduce several topics related to the connection between statistics, machine learning, and dynamical systems. The first topic concerns the learning of the XOR function by a neural network with simultaneous training. Feature learning, where the first layer of a multilayer neural network learns important structures from the data, has been recognized as a key advantage of deep networks. However, demonstrating this theoretically requires specific techniques, such as sequential learning algorithms. This study shows that it is possible to learn the XOR function even when both layers of a two-layer neural network are updated simultaneously. To establish this result, we characterize the fine-grained tracking of neuron variability, which differs from conventional dynamical analyses based on optimization. The second topic discusses statistical inference for high-dimensional parameters, specifically the evaluation of the uncertainty of estimators. Inference for high-dimensional parameters often employs a framework that derives distributions using limit theorems for dynamical algorithms. In this study, we extend this approach to GLMs and single-index models, a representative example of nonlinear models, and demonstrate that statistical inference for high-dimensional parameters can be performed within this setting.
10:55--11:45
Han Bao (Kyoto University)
Title: Self-supervised learning: what neuroscience and learning dynamics teach us
Self-supervised learning (SSL) has become a cornerstone in advancing learning efficiency and enhancing multi-modal alignment in modern machine learning. However, the reasons behind the necessity of certain SSL architectures and components, as well as the benefits they provide, remain unclear. In this talk, I present our recent work, which establishes a strong connection between non-contrastive SSL architectures and a hippocampal model, enabling further improvements in SSL performance. Additionally, we we analyze the learning stability of different SSL architectures based on their learning dynamics.
11:45--13:15
Lunch
13:15--14:05
Dino Sejdinovic (University of Adelaide)
Title: An Overview of Causal Inference using Kernel Embeddings
Kernel embeddings have emerged as a powerful tool for representing probability measures in a variety of statistical inference problems. By mapping probability measures into a reproducing kernel Hilbert space (RKHS), kernel embeddings enable flexible representations of complex relationships between variables. They serve as a mechanism for efficiently transferring the representation of a distribution downstream to other tasks, such as hypothesis testing or causal effect estimation. In the context of causal inference, the main challenges include identifying causal associations and estimating the average treatment effect from observational data, where confounding variables may obscure direct cause-and-effect relationships. Kernel embeddings provide a robust nonparametric framework for addressing these challenges. They allow for the representations of distributions of observational data and their seamless transformation into representations of interventional distributions to estimate relevant causal quantities. We overview recent research that leverages the expressiveness of kernel embeddings in tandem with causal inference.
14:05--14:55
Motonobu Kanagawa (EURECOM)
Title: Comparing Scale Parameter Estimators for Gaussian Process Regression: Cross Validation and Maximum Likelihood
Gaussian process (GP) regression is a Bayesian nonparametric method for regression and interpolation, offering a principled way of quantifying the uncertainties of predicted function values. For the quantified uncertainties to be well-calibrated, however, the covariance kernel of the GP prior has to be carefully selected. In this talk, we theoretically compare two methods for choosing the kernel in GP regression: cross-validation and maximum likelihood estimation. Focusing on the scale-parameter estimation of a Brownian motion kernel in the noiseless setting, we prove that cross-validation can yield asymptotically well-calibrated credible intervals for a broader class of ground-truth functions than maximum likelihood estimation, suggesting an advantage of the former over the latter.
20 min break
15:15--16:05
Naoya Takeishi (The University of Tokyo)
Title: Learning hybrid models combining scientific models and machine learning
Scientific mathematical models, such as differential equations, and machine learning models, such as deep neural nets, are typically considered to be complementary to each other in terms of adaptability to real data and robustness of prediction. We aim to take the best of both worlds by combining the two types of models, namely hybrid or grey-box modeling. Learning hybrid models is not always straightforward; for example, unknown parameters of a scientific model may be severely unidentifiable when the machine learning counterpart has high flexibility, which is problematic when we are interested in interpreting them to gain scientific insights. This talk will introduce technical challenges around hybrid modeling and the status of the current studies.
16:05--16:55
Takeru Miyato (University of Tubingen)
Title: ARTIFICIAL KURAMOTO OSCILLATORY NEURONS
It has long been known in both neuroscience and AI that ``binding'' between neurons leads to a form of competitive learning where representations are compressed in order to represent more abstract concepts in deeper layers of the network. More recently, it was also hypothesized that dynamic (spatiotemporal) representations play an important role in both neuroscience and AI. Building on these ideas, we introduce Artificial Kuramoto Oscillatory Neurons (AKOrN) as a dynamical alternative to threshold units, which can be combined with arbitrary connectivity designs such as fully connected, convolutional, or attentive mechanisms. Our generalized Kuramoto updates bind neurons together through their synchronization dynamics. We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, calibrated uncertainty quantification, and reasoning. We believe that these empirical results show the importance of rethinking our assumptions at the most basic neuronal level of neural representation, and in particular show the importance of dynamical representations.
March 2nd (Sun)
9:15--10:05
Atsushi Nitanda (A*Star)
Title: Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble
Mean-field Langevin dynamics (MFLD) is an optimization method derived by taking the mean-field limit of noisy gradient descent for two-layer neural networks in the mean-field regime. Recently, the propagation of chaos (PoC) for MFLD has gained attention as it provides a quantitative characterization of the optimization complexity in terms of the number of particles and iterations. A remarkable progress by Chen et al. (2022) showed that the approximation error due to finite particles remains uniform in time and diminishes as the number of particles increases. In this paper, by refining the defective log-Sobolev inequality---a key result from that earlier work---under the neural network training setting, we establish an improved PoC result for MFLD, which removes the exponential dependence on the regularization coefficient from the particle approximation term of the optimization complexity. As an application, we propose a PoC-based model ensemble strategy with theoretical guarantees.
10:05--10:55
Taiji Suzuki (The University of Tokyo)
Title: Theories and methodologies of post training and test time inference
In this talk, I will talk about (i) a post training method for diffusion models and (ii) recent theoretical developments of test time inference. In the first part, I introduce a new post training method for diffusion models that directly minimizes a generic regularized loss over probability distributions using the Dual Averaging (DA) method. The DA procedure provides the density ratio between the pre-trained model and the optimized model by which we enable sampling from the learned distribution by approximating its score function via Doob's h-transform technique. The method is theoretically supported by theories on convergence guarantee and discretization error.
In the second half, I present recent theoretical developments that elucidate the learning capabilities of Transformers, focusing on in-context learning and chain-of-thought (CoT). I show that nonlinear feature learning for in-context learning can be done with optimization guarantee where the rate of convergence is characterized by the information exponent of the target function. Finally, if time allows, a theoretical guarantee of chain-of-thought will be given. It is demonstrated that CoT improves learning efficiency drastically by making use of intermediate outputs as a hint of solving a given problem.
15 min break
11:10--12:00
Arnaud Doucet (University of Oxford)
Title: Accelerated Diffusion Models via Speculative Sampling
Speculative sampling is a popular technique for accelerating inference in Large Language Models by generating candidate tokens using a fast draft model and accepting or rejecting them based on the target model's distribution. While speculative sampling was previously limited to discrete sequences, we extend it to diffusion models, which generate samples via continuous, vector-valued Markov chains. In this context, the target model is a high-quality but computationally expensive diffusion model. We propose various drafting strategies, including a simple and effective approach that does not require training a draft model and is applicable out of the box to any diffusion model. Our experiments demonstrate significant generation speedup on various diffusion models, halving the number of function evaluations, while generating exact samples from the target model.
12:00--13:30
Lunch
13:30--14:20
Arthur Gretton (University College London)
Title: Adaptive two-sample testing
"I will address the problem of two-sample testing using the Maximum Mean Discrepancy (MMD). The MMD is an integral probability metric defined using a reproducing kernel Hilbert space (RKHS), with properties determined by the choice of kernel. For good test power, the kernel must be chosen in accordance with the properties of the distributions being compared.
I will assume that the distributions being tested have densities, and the difference in densities lies in a Sobolev ball. The MMD test is then minimax optimal with a specific kernel depending on the smoothness parameter of the Sobolev ball. In practice, this parameter is unknown: to overcome this issue, I describe an aggregated test, called MMDAgg, which is adaptive to the smoothness parameter. The test power is maximised over the collection of kernels used, without requiring held-out data for kernel selection (which results in a loss of test power). MMDAgg controls the test level non-asymptotically, and achieves the minimax rate over Sobolev balls, up to an iterated logarithmic term. Guarantees hold for any product of one-dimensional translation invariant characteristic kernels."
14:20--15:10
Song Liu (University of Bristol)
Title: High-Dimensional Differential Parameter Inference in Exponential Family using Time Score Matching
This paper addresses differential inference in time-varying parametric probabilistic models, like graphical models with changing structures. Instead of estimating a high-dimensional model at each time and inferring changes later, we directly learn the differential parameter, i.e., the time derivative of the parameter. The main idea is treating the time score function of an exponential family model as a linear model of the differential parameter for direct estimation. We use time score matching to estimate parameter derivatives. We prove the consistency of a regularized score matching objective and demonstrate the finite-sample normality of a debiased estimator in high-dimensional settings. Our methodology effectively infers differential structures in high-dimensional graphical models, verified on simulated and real-world datasets.
The talk may include some ongoing work, and the abstract may be supplemented later.
This is a joint work with Daniel J. Williams, Leyang Wang, Qizhen Ying, and Mladen Kolar
https://arxiv.org/abs/2410.10637
20 min break
15:30--16:20
Kenji Fukumizu (The Institute of Statistical Mathematics)
Title: Pairwise Optimal Transport and All-to-All Flow-based Condition Transfer
In this work, we propose a flow-based method for learning all-to-all transfer maps among conditional distributions, approximating pairwise optimal transport. The proposed method addresses the challenge of handling continuous conditions, which often involve a large set of conditions with sparse empirical observations per condition. We introduce a novel cost function that enables simultaneous learning of optimal transports for all pairs of conditional distributions. Our method is supported by a theoretical guarantee that, in the limit, it converges to pairwise optimal transports among infinite pairs of conditional distributions. The learned transport maps are subsequently used to couple data points in conditional flow matching. We demonstrate the effectiveness of this method on synthetic and benchmark datasets, as well as on chemical datasets where continuous physical properties are defined as conditions.
Organizers
General Organising Committee
Taiji Suzuki, The University of Tokyo
Masaaki Imaizumi, The University of Tokyo
Tatsuya Harada, The University of Tokyo
Makoto Yamada, Okinawa Institute of Science and Technology
Kenji Fukumizu, The Institute of Statistical Mathematics
Sponsors
This workshop is supported by the following institutions and grants:
"Innovation of Deep Structured Models with Representation of Mathematical Intelligence"
in
"Creating information utilization platform by integrating mathematical and information sciences, and development to society"