Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

Future Blog Post

less than 1 minute read

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Difference or delay? A comparison of Bayley-III Cognition item scores of young children with and without developmental disabilities

Research in Developmental Disabilities, 2017

We demonstrate that children with developmental disabilities develop cognitive skills in a different order than typically developing children, which violates the assumptions of item response theory. This challenges the validity of developmental tests like the Bayley-III that presume a fixed sequence of skill acquisition.

E-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks

International Conference on Computer Vision (ICCV), 2021

We introduce e-ViL, a comprehensive benchmark for evaluating natural language explanations in vision-language tasks, and e-SNLI-VE, the largest dataset of its kind. We also propose a novel model that significantly outperforms previous approaches, advancing explainable AI for vision-language understanding.

Explaining Chest X-ray Pathologies in Natural Language

Medical Image Computing and Computer Assisted Interventions, 2022

We introduce MIMIC-NLE, the first dataset with natural language explanations for chest X-ray predictions, enabling intrinsically explainable medical AI. We demonstrate how these human-friendly explanations address critical limitations in current systems, potentially accelerating clinical adoption.

Certified Calibration: Bounding Worst-Case Calibration under Adversarial Attacks

New Frontiers in Adversarial Machine Learning - ICML, 2023

We introduce certified calibration, a novel approach providing worst-case bounds on neural classifier confidence under adversarial attacks. We demonstrate that existing defences do not protect calibration sufficiently, and provide analytic bounds for the Brier score and approximate bounds for Expected Calibration Error using mixed integer nonlinear programming.

A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks

International Conference on Learning Representations (ICLR), 2024

We investigate predictive coding networks by developing a more efficient and stable training algorithm through a simple temporal scheduling change to synaptic weight updates. This incremental predictive coding approach not only provides theoretical convergence guarantees and improved biological plausibility, but consistently outperforms original formulations across image classification and language modeling tasks.

Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting

Empirical Methods in Natural Language Processing (EMNLP), 2024
Award: EMNLP Outstanding Paper Award

We investigate human-AI interaction by studying how healthcare professionals use different explanation types during chest X-ray analysis, finding text explanations induce over-reliance while multimodal approaches improve safety. This work marks a major step towards studying patient utility.

Shh, don’t say that! Domain Certification in LLMs

Socially Responsible Language Modelling Research (SoLaR) workshop @ NeurIPS, 2024

We introduce domain certification, a formal guarantee that accurately characterizes when language models stay within their intended operational boundaries. We demonstrate VALID, our effective approach that provides provable defense against adversarial inputs through meaningful certificates that ensure models remains within its intended domain, even under attack.

Towards Certification of Uncertainty Calibration under Adversarial Attacks

International Conference on Learning Representations (ICLR), 2025

We tackle the vulnerability of uncertainty quantification in neural classifiers to adversarial attacks and thus propose certified calibration to provide worst-case bounds on confidence under perturbations. We develop novel calibration attacks that enable adversarial calibration training, demonstrating improved model uncertainty quantification in safety-critical applications.

Shh, don’t say that! Domain Certification in LLMs

International Conference on Learning Representations (ICLR), 2025

We introduce domain certification, a new safety paradigm focusing on risk control for LLMs. We provide formal a guarantee that accurately characterizes when language models stay within their intended operational boundaries. We demonstrate a effective test-time algorithm, VALID, that provides scalable defenses for foundation models.

Benchmarking Predictive Coding Networks – Made Simple

International Conference on Learning Representations (ICLR), 2025
Award: ICLR Spotlight

We benchmark predictive coding networks extensively on large-scale tasks, providing critical insights into state-of-the-art performance limitations and theoretical challenges that must be addressed. We introduce PCX, a super fast and flexible open-source library that emphasizes performance and simplicity with a user-friendly interface, enabling the community to overcome fragmentation and tackle the critical scalability challenge.

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Conference on Neural Information Processing Systems (NeurIPS), 2025

We systematically reviewed 445 LLM benchmarks and found that many measures lack construct validity, especially for abstract goals like safety and robustness, leading to unreliable claims about model capabilities. They outline eight recommendations with practical guidance to design benchmarks that better align tasks and scoring with the phenomena they aim to measure.