Joint Colloquium with UC Berkeley & UW
UC Berkeley speaker:
Dr. Jacob Steinhardt
The Science of Measurement in Machine Learning
In machine learning, we are obsessed with datasets and metrics: progress in areas as diverse as natural language understanding, object recognition, and reinforcement learning is tracked by numerical scores on agreed-upon benchmarks. However, other ways of measuring ML models are underappreciated, and can unlock important insights.
In this talk, I'll discuss two important quantities beyond test accuracy: datapoint-level variation and similarity between representations, and robustness. A challenge in both cases is that interesting trends are often dominated by statistical noise. We address this issue, and make new discoveries:
* As neural network models get larger, while overall accuracy increases, many individual predictions get worse.
* Models of different depth appear to still make similar computations in similar orders.
Beyond these specific observations, I will tie measurement to historical trends in science, and draw lessons from the success of biology and physics in the mid-20th century.
Dr. Emilijia Perkovic
Minimal Enumeration of All Possible Total Effects in a Markov Equivalence Class
In observational studies, when a total causal effect of interest is not identified, the set of all possible effects can be reported instead. This typically occurs when the underlying causal DAG is only known up to a Markov equivalence class, or refinement thereof due to background knowledge. As such, the class of possible causal DAGs is represented by a maximally oriented partially directed acyclic graph (MPDAG), which contains both directed and undirected edges. We characterize the minimal additional edge orientations required to identify a given total effect. We also present a recursive algorithm to enumerate subclasses of DAGs, such that the total effect in each subclass is identified as a distinct functional of the observed distribution. This resolves an issue with existing methods, which often report possible total effects with duplicates, namely those that are numerically distinct due to sampling variability but are in fact causally identical.