QLS Seminar Series - Alex Diaz-Papkovich
Topological analysis of high-dimensional human genetic data in biobanks
Alex Diaz-Papkovich, Brown UniversityÂ
Tuesday February 6, 12-1pm
Zoom Link:Ìý
In Person: 550 Sherbrooke, Room 189
Abstract:ÌýNow storing the genetic data of millions of individuals, biobanks have become rich repositories regularly used for scientific study and discovery. With the human genome spanning some three billion base pairs, any statistical analysis of a biobank is inherently a high-dimensional problem. To say nothing of the complexity of human genetics, we encounter challenges in both the scale of the data and in their composition.
We develop a tractable approach to study biobanks using uniform manifold approximation and projection (UMAP), a form of non-linear dimensionality reduction based in topological data analysis, and HDBSCAN, a density-based clustering algorithm. Using these tools, we visualize the data contained in biobanks and illustrate the relationships between population structure—the phenomenon of non-random genetic variation—and variables like geography, demographic history, migration, social structure, and environmental measures. We identify population structure at a variety of scales, ranging from a handful to hundreds of thousands of individuals, uncover subtle relationships between our data, and discuss applications to exploratory data analysis, data QC, and polygenic scoring.