Machine Learning – Publications
1.
Vo, Huyen; Martı́nez-Garcı́a, Marı́a; Valera, Isabel
Holder++: Improving the Quality-Coherence Trade-off in Multimodal VAEs Proceedings Article
In: 2026.
Abstract | Links | BibTeX | Tags: huyen, isabel, maria
@inproceedings{nokey,
title = {Holder++: Improving the Quality-Coherence Trade-off in Multimodal VAEs},
author = {Huyen Vo and Marı́a Martı́nez-Garcı́a and Isabel Valera},
url = {https://vothuckhanhhuyen.github.io/assets/pdf/Holder_ICML2026.pdf},
year = {2026},
date = {2026-03-11},
urldate = {2026-03-11},
abstract = {Existing approaches for multimodal variational autoencoders (VAEs) face a trade-off between generative quality and coherence—i.e., they struggle to generate realistic and diverse samples that, at the same time, are semantically consistent across modalities. A recent work shows that using a simple approximation to Hölder pooling as an aggregation method improves coherence over the SOTA MMVAE+, despite assuming a single shared representation across all modalities. Yet, it slightly compromises sample diversity. Inspired by this insight, we propose Hölder++, a novel multimodal VAE that improves the generative quality-coherence trade-off through: (i) the first implementation of Hölder pooling without any approximation for multimodal VAEs; (ii) an extended architecture that models distinct shared and private (i.e., modality-specific) representations (Hölder+); and (iii) hierarchical inference that further enhances the disentanglement between the shared and private representations (Hölder++). Our experiments corroborate that Hölder++ consistently improves the generative quality-coherence trade-off, yields more structured latent spaces, and learns shared representations that are informative for downstream tasks.},
keywords = {huyen, isabel, maria},
pubstate = {published},
tppubtype = {inproceedings}
}
Existing approaches for multimodal variational autoencoders (VAEs) face a trade-off between generative quality and coherence—i.e., they struggle to generate realistic and diverse samples that, at the same time, are semantically consistent across modalities. A recent work shows that using a simple approximation to Hölder pooling as an aggregation method improves coherence over the SOTA MMVAE+, despite assuming a single shared representation across all modalities. Yet, it slightly compromises sample diversity. Inspired by this insight, we propose Hölder++, a novel multimodal VAE that improves the generative quality-coherence trade-off through: (i) the first implementation of Hölder pooling without any approximation for multimodal VAEs; (ii) an extended architecture that models distinct shared and private (i.e., modality-specific) representations (Hölder+); and (iii) hierarchical inference that further enhances the disentanglement between the shared and private representations (Hölder++). Our experiments corroborate that Hölder++ consistently improves the generative quality-coherence trade-off, yields more structured latent spaces, and learns shared representations that are informative for downstream tasks.
2.
Martínez-García, María; Villacrés, Grace; Mitchell, David; Olmos, Pablo M
Improved Variational Inference in Discrete VAEs using Error Correcting Codes Proceedings Article
In: The 41st Conference on Uncertainty in Artificial Intelligence, 2025.
Abstract | Links | BibTeX | Tags: maria
@inproceedings{martinezimproved,
title = {Improved Variational Inference in Discrete VAEs using Error Correcting Codes},
author = {María Martínez-García and Grace Villacrés and David Mitchell and Pablo M Olmos},
url = {https://proceedings.mlr.press/v286/martinez-garcia25a.html},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
booktitle = {The 41st Conference on Uncertainty in Artificial Intelligence},
abstract = {Despite advances in deep probabilistic models, learning discrete latent representations remains challenging. This work introduces a novel method to improve inference in discrete Variational Autoencoders by reframing the inference problem through a generative perspective. We conceptualize the model as a communication system, and propose to leverage Error-Correcting Codes (ECCs) to introduce redundancy in latent representations, allowing the variational posterior to produce more accurate estimates and reduce the variational gap. We present a proof-of-concept using a Discrete Variational Autoencoder with binary latent variables and low-complexity repetition codes, extending it to a hierarchical structure for disentangling global and local data features. Our approach significantly improves generation quality, data reconstruction, and uncertainty calibration, outperforming the uncoded models even when trained with tighter bounds such as the Importance Weighted Autoencoder objective. We also outline the properties that ECCs should possess to be effectively utilized for improved discrete variational inference.},
keywords = {maria},
pubstate = {published},
tppubtype = {inproceedings}
}
Despite advances in deep probabilistic models, learning discrete latent representations remains challenging. This work introduces a novel method to improve inference in discrete Variational Autoencoders by reframing the inference problem through a generative perspective. We conceptualize the model as a communication system, and propose to leverage Error-Correcting Codes (ECCs) to introduce redundancy in latent representations, allowing the variational posterior to produce more accurate estimates and reduce the variational gap. We present a proof-of-concept using a Discrete Variational Autoencoder with binary latent variables and low-complexity repetition codes, extending it to a hierarchical structure for disentangling global and local data features. Our approach significantly improves generation quality, data reconstruction, and uncertainty calibration, outperforming the uncoded models even when trained with tighter bounds such as the Importance Weighted Autoencoder objective. We also outline the properties that ECCs should possess to be effectively utilized for improved discrete variational inference.
