Modern machine learning models excel in prediction accuracy and model utility but are often difficult to interpret. Their large sizes also lead to high storage, inference, and evaluation costs. First, using decision tree ensembles (e.g., random forests and gradient boosting) as an example, we discuss extracting simple tree substructures (rule collections) from large ensembles. Our estimator is defined as the solution to a discrete optimization problem (integer program) that can balance multiple considerations, including prediction accuracy, number of rules, rule depth, feature usage, and stability constraints. We then discuss how related problems arise--and can be highly effective--in compressing large neural networks and LLMs, where the main goal is improving model efficiency. We will discuss different algorithms, some amenable to GPU acceleration, for addressing these large-scale discrete optimization problems. Some of these algorithmic tools may be of independent interest in other applications.