Center for Statistical Sciences, Center for Computational Molecular Biology, Brown University
Statistical Pipeline for Identifying Features that Differentiate Classes of 3D Shapes
The recent curation of large-scale databases with 3D surface scans of shapes has motivated the development of tools that better detect global-patterns in morphological variation. Studies which focus on identifying differences between shapes have been limited to simple pairwise comparisons and rely on pre-specified landmarks (that are often known). In this talk, we present SINATRA: a statistical pipeline for analyzing collections of shapes without requiring any correspondences. Our method takes in two classes of shapes and highlights the physical features that best describe the variation between them.
The SINATRA pipeline implements four key steps. First, SINATRA summarizes the geometry of 3D shapes (represented as triangular meshes) by a collection of vectors (or curves) that encode changes in their topology. Second, a nonlinear Gaussian process model, with the topological summaries as input, classifies the shapes. Third, an effect size analog and corresponding association metric is computed for each topological feature used in the classification model. These quantities provide evidence that a given topological feature is associated with a particular class. Fourth, the pipeline iteratively maps the topological features back onto the original shapes (in rank order according to their association measures) via a reconstruction algorithm. This highlights the physical (spatial) locations that best explain the variation between the two groups.
We use a rigorous simulation framework to assess our approach, which themselves are a novel contribution to 3D image analysis. Lastly, as a case study, we use SINATRA to analyze mandibular molars from four different suborders of primates and demonstrate its ability to recover known morphometric variation across phylogenies.
Crawford Lab, Brown University
Primer: Integrating Topological Data Analysis (TDA) with Statistical Learning Techniques
Topological data analysis (TDA) has emerged as a scalable way to extract key information from large data sets, while not depending on metrics or geodesics. As data has continually increased in size and complexity, TDA has progressed beyond the original persistence diagram to techniques that offer better interpretability, higher computational efficiency, and amenability with a wide range of frequently used statistical and machine learning techniques --- all while maintaining robustness and stability to noise. In this talk, we explore the development of topological invariants since the persistence diagram, such as the persistence landscape and the smooth Euler characteristic transform. We will discuss where these topological techniques have been applied and posit how early results in these areas have motivated novel methods we may see in the future.