Body

The rapid advancement of large-scale AI has led to widespread adoption, yet research into its reliability, safety, and social implications remains limited. My dissertation develops theoretical foundations and practical methods for building more reliable and responsible AI systems by addressing three dimensions of Trustworthy AI: diagnosis, control, and societal impact.

In my defense, I will briefly cover results from Part I (Diagnosis) and Part II (Control). Part I provides statistical and computational guarantees for influence diagnostics, offering tools to detect and characterize bias in models from generalized linear models to attention-based architectures. Part II develops methods for controllable generation across models of different scales and modalities, from small language models to vision-language systems. By enabling precise control over outputs, these methods help ensure AI behavior aligns with user intentions and ethical guidelines.

The focus of my talk will be on Part III (Societal Impact), which examines how AI bias affects users. I will present experiments showing that partisan bias in large language models can shape political opinions and decision-making. Building on these findings, I explain why true political neutrality in AI is unattainable and propose practical methods for approximating neutrality and evaluating models. This work addresses societal risks and contributes to developing AI systems that are more transparent, responsible, and trustworthy.

Together, these contributions advance Trustworthy AI by combining statistical rigor with practical experimentation, strengthening our ability to diagnose and control AI while exposing societal risks and outlining pathways for mitigation.