Name: Symbolic Regression Talk at EUROMECH Colloquium 662
Start: 2026-04-28T17:00:00Z
End: 2026-05-30T19:00:00Z

Symbolic Regression Talk at EUROMECH Colloquium 662

Apr 28, 2026·

Giorgio Morales

· 2 min read

EUROMECH Colloquium 662. Photo: Alice Cicirello

Abstract

Discovering Non-Linear Equations Under Epistemic Uncertainty Using Transformer-Based Multi-Set Skeleton Prediction.

Date

Apr 28, 2026 5:00 PM — May 30, 2026 7:00 PM

Event

Euromech Colloquium “Physics-enhanced machine learning and data-driven nonlinear dynamics”

Discovering interpretable mathematical descriptions of nonlinear systems from data is a central goal in scientific machine learning. However, existing data-driven and symbolic regression (SR) approaches often struggle in data-scarce settings, where epistemic uncertainty leads to unstable models and overfitting to local artifacts. We propose an uncertainty-aware framework that integrates adaptive sampling (AS) with a Multi-Set Symbolic Skeleton Prediction (MSSP) approach, enabling the progressive extraction of stable and accurate symbolic expressions from learned models as data coverage improves. We present a pipeline that combines a function approximator (e.g., a neural network trained on the currently available observations) with an MSSP-based stage. Rather than performing SR on a single global input–response pairing, MSSP constructs multiple input–response subsets sampled from the model’s response surface. These distinct yet related subsets are used to recover a common symbolic skeleton that captures the shared structure of the underlying mapping while being robust to localized distortions caused by sparse sampling or noise. After skeletons are proposed using a pre-trained Multi-Set Transformer, coefficients are fitted against the observed data to produce the final expressions.

We use an AS loop to iteratively reduce epistemic uncertainty across the input domain. At each iteration, the learned predictor is re-evaluated on a fixed test grid to characterize where uncertainty remains large. AS then prioritizes new observations in these epistemically uncertain regions using prediction interval-based metrics and a batch sampling strategy based on Gaussian processes. MSSP is re-applied throughout this process to monitor how recovered expressions evolve as coverage improves. As a proof-of-concept, we demonstrate our pipeline on 1-D synthetic problems, where the estimated expressions begin to match the true functional form after sufficient AS iterations. Although correct or near-correct expressions can occasionally be identified at early stages, they are typically unstable. By coupling MSSP with AS, these effects are progressively mitigated as uncertainty is reduced and coverage improves, allowing convergence toward simpler and correct functional forms. While results are presented for 1-D problems, the framework naturally extends to higher-dimensional systems.

Last updated on Apr 28, 2026