Publications

You can also find my articles on my Google Scholar profile.

Conference Papers


Benchmarking Predictive Coding Networks – Made Simple

International Conference on Learning Representations (ICLR), 2025
Award: ICLR Spotlight

We benchmark predictive coding networks extensively on large-scale tasks, providing critical insights into state-of-the-art performance limitations and theoretical challenges that must be addressed. We introduce PCX, a super fast and flexible open-source library that emphasizes performance and simplicity with a user-friendly interface, enabling the community to overcome fragmentation and tackle the critical scalability challenge.

Shh, don’t say that! Domain Certification in LLMs

International Conference on Learning Representations (ICLR), 2025

We introduce domain certification, a new safety paradigm focusing on risk control for LLMs. We provide formal a guarantee that accurately characterizes when language models stay within their intended operational boundaries. We demonstrate a effective test-time algorithm, VALID, that provides scalable defenses for foundation models.

Towards Certification of Uncertainty Calibration under Adversarial Attacks

International Conference on Learning Representations (ICLR), 2025

We tackle the vulnerability of uncertainty quantification in neural classifiers to adversarial attacks and thus propose certified calibration to provide worst-case bounds on confidence under perturbations. We develop novel calibration attacks that enable adversarial calibration training, demonstrating improved model uncertainty quantification in safety-critical applications.

Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting

Empirical Methods in Natural Language Processing (EMNLP), 2024
Award: EMNLP Outstanding Paper Award

We investigate human-AI interaction by studying how healthcare professionals use different explanation types during chest X-ray analysis, finding text explanations induce over-reliance while multimodal approaches improve safety. This work marks a major step towards studying patient utility.

A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks

International Conference on Learning Representations (ICLR), 2024

We investigate predictive coding networks by developing a more efficient and stable training algorithm through a simple temporal scheduling change to synaptic weight updates. This incremental predictive coding approach not only provides theoretical convergence guarantees and improved biological plausibility, but consistently outperforms original formulations across image classification and language modeling tasks.

Explaining Chest X-ray Pathologies in Natural Language

Medical Image Computing and Computer Assisted Interventions, 2022

We introduce MIMIC-NLE, the first dataset with natural language explanations for chest X-ray predictions, enabling intrinsically explainable medical AI. We demonstrate how these human-friendly explanations address critical limitations in current systems, potentially accelerating clinical adoption.

E-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks

International Conference on Computer Vision (ICCV), 2021

We introduce e-ViL, a comprehensive benchmark for evaluating natural language explanations in vision-language tasks, and e-SNLI-VE, the largest dataset of its kind. We also propose a novel model that significantly outperforms previous approaches, advancing explainable AI for vision-language understanding.

Journal Articles


Workshops


Shh, don’t say that! Domain Certification in LLMs

Socially Responsible Language Modelling Research (SoLaR) workshop @ NeurIPS, 2024

We introduce domain certification, a formal guarantee that accurately characterizes when language models stay within their intended operational boundaries. We demonstrate VALID, our effective approach that provides provable defense against adversarial inputs through meaningful certificates that ensure models remains within its intended domain, even under attack.

Certified Calibration: Bounding Worst-Case Calibration under Adversarial Attacks

New Frontiers in Adversarial Machine Learning - ICML, 2023

We introduce certified calibration, a novel approach providing worst-case bounds on neural classifier confidence under adversarial attacks. We demonstrate that existing defences do not protect calibration sufficiently, and provide analytic bounds for the Brier score and approximate bounds for Expected Calibration Error using mixed integer nonlinear programming.