Upcoming Talks
(11/4//2024) Speaker: Yibo Wang
University of Illinois, Chicago
Title
Analyzing and Exposing Vulnerabilities in Language Models
Abstract
Large Language Models (LLMs) have demonstrated impressive capabilities across various applications, yet they remain vulnerable to biases and adversarial attacks, compromising their trustworthiness. This presentation introduces two papers exploring these critical issues: robustness and fairness in LLMs. The first paper introduces a new adversarial attack method with lower detectability and better transferability to LLMs. While recent attacks achieve high success rates, the adversarial examples often deviate from the original data distribution, making them detectable. This paper proposes a Distribution-Aware Adversarial Attack method that considers distribution shifts to enhance attack effectiveness. Experiments validate the method’s efficacy and transferability to LLMs across multiple datasets and models. The second paper explores gender affiliations in text generation, where LLMs often infer gender from inputs without explicit gender information, reinforcing stereotypes. The paper systematically investigates, quantifies, and mitigates gender affiliations in LLMs.
Bio
Yibo Wang is a Ph.D. student in the Computer Science Department at University of Illinois Chicago, under the supervision of Professor Philip S. Yu. Her primary research areas include natural language processing and large language models, with a focus on trustworthy large language models, and code generation using large language models.
Video
Coming soon
Questions for the Speaker
Please add your questions to the speaker either to this google form or directly under the YouTube video