Skip to the content.

Speakers

Ishan Misra

Ishan Misra

Facebook AI Research/Meta, New York, USA

Jimeng Sun

Jimeng Sun

University of Illinois Urbana-Champaign, USA

Marinka Zitnik

Marinka Zitnik

Harvard University, USA

Neil Zeghidour

Neil Zeghidour

KyutAI, France

Ahmad Beirami

Ahmad Beirami

Google Research, New York, USA

Talk details

Ishan Misra
Ishan Misra
Facebook AI Research/Meta, New York, USA

Title: Generative Models for Multimodal Learning

Abstract: In this talk, I will present my recent work on using generative models for multimodal learning. In particular, I'll talk about how diffusion models can be used to create powerful world models that can generate videos, combined with LLMs, and used as classifiers for evaluating vision-language tasks.

Bio: I am a Research Scientist at Meta AI. My research interest is in reducing the need for supervision in visual learning. For my work in self-supervised learning, I was featured in the MIT Tech Review’s 35 innovators under 35 list (compiled globally across technological disciplines). You can hear me on Lex Fridman’s podcast for a fun overview of my work. I finished my PhD at the Robotics Institute at Carnegie Mellon University where I worked with Martial Hebert and Abhinav Gupta. My PhD Thesis was titled “Visual Learning with Minimal Human Supervision” for which I received the SCS Distinguished Dissertation Award (Runner Up).

Jimeng Sun
Jimeng Sun
University of Illinois Urbana-Champaign, USA

Title: Generative AI for Clinical Trial Development

Abstract: We present three recent papers on how Generative AI can help clinical trial development:
1. TrialGPT: Matching Patients to Clinical Trials Recruiting the right patients quickly can be challenging in clinical trials. TrialGPT uses Large Language Models (LLMs) to predict a patient's suitability for various trials based on their medical notes, making it easier to find the best match.
2. AutoTrial: Improving Clinical Trial Eligibility Criteria Design Designing eligibility criteria for clinical trials can be complex. AutoTrial uses language models to streamline this process. It combines targeted generation, adapts to new information, and clearly explains its decisions.
3. MediTab: Handling Diverse Clinical Trial Data Tables Medical data, such as clinical trial results, comes in various tables, making it hard to compare and combine. MediTab uses LLMs to merge different data tables and align unfamiliar data to ensure consistency and accuracy.

Bio: Dr. Sun is a Health Innovation Professor at the Computer Science Department and Carle Illinois College of Medicine at University of Illinois Urbana Champaign. Previously, he was an associate professor at Georgia Tech's College of Computing and co-directed the Center for Health Analytics and Informatics. Dr. Sun's research focuses on using artificial intelligence (AI) to improve healthcare. This includes deep learning for drug discovery, clinical trial optimization, computational phenotyping, clinical predictive modeling, treatment recommendation, and health monitoring. He has been recognized as one of the Top 100 AI Leaders in Drug Discovery and Advanced Healthcare. He collaborates with leading hospitals such as MGH, Beth Israel Deaconess, OSF healthcare, Northwestern, Sutter Health, Vanderbilt, Northwestern, Geisinger, and Emory, as well as the biomedical industry, including IQVIA, Medidata and multiple pharmaceutical companies. Dr. Sun earned his B.S. and M.Phil. in computer science at Hong Kong University of Science and Technology, and his Ph.D. in computer science at Carnegie Mellon University.

Marinka Zitnik
Marinka Zitnik
Harvard University, USA

Title: Towards universal models for time series

Abstract: Foundation models have transformed deep learning by enabling a single model to adapt to various tasks without extensive additional training, reducing the need for training a separate model for each task. Although successful in vision and language, applying this idea to time series data presents unique challenges. Time series are characterized by diverse temporal dynamics, semantic variations, irregular sampling, and system-related factors such as different devices or subjects, as well as shifts in feature and label distributions. These characteristics are often incompatible with the next-token prediction approach of large language models. In this talk, I will discuss our research efforts aimed at developing capabilities for time-series foundation models. We begin with TF-C (NeurIPS 2022), a pre-training strategy for time series that uses a self-supervised consistency objective to make learned temporal and frequency representations consistent. We then introduce Raincoat (ICML 2023), a method for closed-set and universal domain adaptation. It is robust to shifts in both features and labels, enabling model transfer between source and unlabeled target domains, even when there is no label overlap. Our TimeX approach (NeurIPS 2023) provides an interpretable surrogate model for analyzing time series model behavior. This approach ensures model behavior consistency, offers discrete attribution maps, and improves interpretability. Finally, Raindrop (ICLR 2022) is an innovative method for handling irregularly sampled multivariate time series. It uses a graph neural network to understand time-varying dependencies among sensors, surpassing existing methods in classification and temporal dynamics interpretation. These approaches pave the way for foundation models for time series.

Bio: Marinka Zitnik is an Assistant Professor at Harvard in the Department of Biomedical Informatics with additional appointments at the Kempner Institute for the Study of Natural and Artificial Intelligence, Broad Institute of MIT and Harvard, and Harvard Data Science. Dr. Zitnik investigates the foundations of AI to enhance scientific discovery and to realize individualized diagnosis and treatment. Her research won several best paper and research awards, including Kavli Fellowship of the National Academy of Sciences, awards from International Society for Computational Biology, International Conference in Machine Learning, Bayer Early Excellence in Science, Amazon Faculty Research, Google Faculty Research, and Roche Alliance with Distinguished Scientists. She founded Therapeutics Data Commons, a global open-science AI foundation to advance therapeutic science, and is also the faculty lead of the AI4Science initiative.

Neil Zeghidour
Neil Zeghidour
KyutAI, France

Title: Audio Language Models

Abstract: Audio analysis and audio synthesis require modeling long-term, complex phenomena and have historically been tackled in an asymmetric fashion, with specific analysis models that differ from their synthesis counterpart. In this presentation, we will introduce the concept of audio language models, a recent innovation aimed at overcoming these limitations. By discretizing audio signals using a neural audio codec, we can frame both audio generation and audio comprehension as similar autoregressive sequence-to-sequence tasks, capitalizing on the well-established Transformer architecture commonly used in language modeling. This approach unlocks novel capabilities in areas such as textless speech modeling, zero-shot voice conversion, and even text-to-music generation. Furthermore, we will illustrate how the integration of analysis and synthesis within a single model enables the creation of versatile audio models capable of handling a wide range of tasks involving audio as inputs or outputs. We will conclude by highlighting the promising prospects offered by these models and discussing the key challenges that lie ahead in their development.

Bio: Neil is co-founder and Chief Modeling Officer of the Kyutai non-profit research lab. He was previously at Google DeepMind, where he started and led a team working on generative audio, with contributions including Google’s first text-to-music API, a voice preserving speech-to-speech translation system, and the first neural audio codec that outperforms general-purpose audio codecs. Before that, Neil spent three years at Facebook AI Research, working on automatic speech recognition and audio understanding. He graduated with a PhD in machine learning from Ecole Normale Supérieure (Paris), and holds an MSc in machine learning from Ecole Normale Supérieure (Saclay) and an MSc in quantitative finance from Université Paris Dauphine. In parallel with his research activities, Neil teaches speech processing technologies at the École Normale Supérieure (Saclay).

Ahmad Beirami
Ahmad Beirami
Google Research, New York, USA

Title: Language Model Alignment: Theory & Practice

Abstract: Generative language models have advanced to a level where they can effectively solve a variety of open-domain tasks with little task specific supervision. However, the generated content from these models may still not satisfy the preference of a human user. The goal of the 𝑎𝑙𝑖𝑔𝑛𝑚𝑒𝑛𝑡 process is to remedy this issue by generating content from an aligned model that improves a reward (e.g., make the generation more safe) but does not perturb much from the base model. A simple baseline for this task is best-of-N, where N responses are drawn from the base model, ranked based on a reward, and the highest ranking one is selected. More sophisticated techniques generally solve a KL-regularized reinforcement learning (RL) problem with the goal of maximizing expected reward subject to a KL divergence constraint between the aligned model and the base model. An alignment technique is preferred if its reward-KL tradeoff curve dominates other techniques. In this talk, we give an overview of language model alignment and give an understanding of known results in this space through simplified examples. We also present a new modular alignment technique, called controlled decoding, which solves the KL-regularized RL problem while keeping the base model frozen through learning a prefix scorer, offering inference-time configurability. Finally, we also shed light on the remarkable performance of best-of-N in terms of achieving competitive or even better reward-KL tradeoffs when compared to state-of-the-art alignment baselines.

Bio: Ahmad Beirami is a research scientist at Google Research, leading research efforts on building safe, helpful, and scalable generative language models. At Meta AI, he led research to power the next generation of virtual digital assistants with AR/VR capabilities through robust generative language modeling. At Electronic Arts, he led the AI agent research program for automated playtesting of video games and cooperative reinforcement learning. Before moving to industry in 2018, he held a joint postdoctoral fellow position at Harvard & MIT, focused on problems in the intersection of core machine learning and information theory. He is the recipient of the Sigma Xi Best PhD Thesis Award from Georgia Tech.