Symbolic Regression Talk at EUROMECH Colloquium 662

Apr 28, 2026·
Giorgio Morales
Giorgio Morales
· 2 min read
EUROMECH Colloquium 662. Photo: Alice Cicirello
Abstract
Discovering Non-Linear Equations Under Epistemic Uncertainty Using Transformer-Based Multi-Set Skeleton Prediction.
Date
Apr 28, 2026 5:00 PM — May 30, 2026 7:00 PM
Event

Discovering interpretable mathematical descriptions of nonlinear systems from data is a central goal in scientific machine learning. However, existing data-driven and symbolic regression (SR) approaches often struggle in data-scarce settings, where epistemic uncertainty leads to unstable models and overfitting to local artifacts. We propose an uncertainty-aware framework that integrates adaptive sampling (AS) with a Multi-Set Symbolic Skeleton Prediction (MSSP) approach, enabling the progressive extraction of stable and accurate symbolic expressions from learned models as data coverage improves. We present a pipeline that combines a function approximator (e.g., a neural network trained on the currently available observations) with an MSSP-based stage. Rather than performing SR on a single global input–response pairing, MSSP constructs multiple input–response subsets sampled from the model’s response surface. These distinct yet related subsets are used to recover a common symbolic skeleton that captures the shared structure of the underlying mapping while being robust to localized distortions caused by sparse sampling or noise. After skeletons are proposed using a pre-trained Multi-Set Transformer, coefficients are fitted against the observed data to produce the final expressions.

We use an AS loop to iteratively reduce epistemic uncertainty across the input domain. At each iteration, the learned predictor is re-evaluated on a fixed test grid to characterize where uncertainty remains large. AS then prioritizes new observations in these epistemically uncertain regions using prediction interval-based metrics and a batch sampling strategy based on Gaussian processes. MSSP is re-applied throughout this process to monitor how recovered expressions evolve as coverage improves. As a proof-of-concept, we demonstrate our pipeline on 1-D synthetic problems, where the estimated expressions begin to match the true functional form after sufficient AS iterations. Although correct or near-correct expressions can occasionally be identified at early stages, they are typically unstable. By coupling MSSP with AS, these effects are progressively mitigated as uncertainty is reduced and coverage improves, allowing convergence toward simpler and correct functional forms. While results are presented for 1-D problems, the framework naturally extends to higher-dimensional systems.

figure
Lake Como.
figure
View from the conference room.
figure
View from the conference room.
figure
Presentation day.