With the advent of modern technologies, many scientific fields collect and analyze increasingly large datasets. The complexity and heterogeneity of these datasets cannot be properly captured through classical statistical models. Mixture models represent a broad framework that can help alleviate this issue. Their key feature is the assumption that the overall population consists of several subpopulations, and each of these subpopulations can be represented through simpler, classical models. In this thesis, we explore mixture models defined through three classes of distributions for different data types, as follows. The first type of mixture model is called the bi-$s^*$-concave distribution for continuous data. We propose this distribution as a generalization of two popular distributions, i.e., the $s$-concave distribution and the bi-log-concave distribution, in the field of estimation under shape constraints to include multimodal and heavy-tail densities. Although its definition is not directly related to mixture models, this class include several important mixture distributions (e.g., mixture of Student-t distributions, mixture of Gaussian distributions) under some conditions. The second type of mixture model is the nonparametric Poisson mixture distribution for count data, which generalizes Poisson distribution by assuming its parameter following from a totally unknown mixing distribution. We provide a minimax-optimal convergence rate for the nonparametric maximum likelihood estimation for the mixing distribution and apply it on a single-cell RNA-sequencing data. The third type of mixture model is the log-linear mixture distribution for multivariate categorical data. This is a generalization of the classical log-linear model. We are currently working on providing general sufficient or necessary conditions to guarantee its identifiability and apply it for inferring dependence structures among categorical variables.