Poincaré maps is a new method for representing hierarchical relationships in order to determine how cells develop over time. It is the first technique to use hyperbolic geometry in order to uncover the hierarchical relationships from pairwise similarities of different cells. Poincaré maps takes high-dimensional measurements of gene features and then produces a two-dimensional representation that reveals these relationships. Whereas the performance of previous methods varies depending on the specific dataset, Poincaré maps consistently produces high-quality embeddings across a range of different datasets. Moreover, our method offers improved visualization, clustering, hierarchy identification, and reconstruction of the natural order of a stochastic process in a single embedding.
By creating Poincaré maps, Facebook AI has been able to test and refine new applications for hyperbolic embeddings using biologists’ extensive real-world datasets. This work will both help advance computational analysis of cellular development and help AI researchers find new applications for hyperbolic embeddings.
In our experiments, we found the Poincare maps can reveal complex cell developmental processes that have not been discovered by prior methods. We are sharing our code with the research community to help others develop new hypotheses about biological processes.
The discovery of continuous hierarchies — as proposed, for instance, in Waddington’s epigenetic landscape — is a central task in developmental biology and single-cell analysis.
Hyperbolic space, which can be thought of as a continuous version of trees, offers promising advantages for this task, including representational efficiency, small distortion of hierarchical relationships, and easy interpretability. To exploit these properties for the discovery of hierarchies from noisy measurements (as are common in current single-cell data), we developed an algorithm that connects hyperbolic embeddings with manifold learning and pseudo-temporal ordering. In our approach, we were able to build on previous work from Facebook AI which proposed hyperbolic embeddings for learning hierarchical representations.
Given various feature representations of cells (such as their gene expressions), Poincaré maps uses a three-step process to estimate the structure of the underlying treelike manifold. First, we compute a connected k-nearest neighbor graph (kNNG), where each node corresponds to an individual cell and each edge has a weight proportional to the Euclidean distance between the features of the two connected cells. This allows us to estimate the local geometries of the underlying manifold, around which Euclidean distances remain a good approximation.
Second, we compute geodesic distances on the kNN graph (using measures such as “all pairs shortest paths” or the “relative forest accessibilities” index) to estimate the intrinsic geometry of the underlying manifold. While Euclidean distances, as used in the first step, provide a good approximation of distances between nearby points on the manifold, this second step provides an approximation for the distances between all points (including faraway pairs).
Finally, we compute a two-dimensional embedding for each cell in the Poincaré disk, such that their hyperbolic distances reflect the geodesic distances inferred from the previous step. The geometry of the Poincaré disk allows us to model continuous hierarchies efficiently in just two dimensions. In our approach, embeddings that are close to the origin of the disk will have a relatively small distance to all other points, representing the root of the hierarchy, or the beginning of a developmental process. On the other hand, embeddings that are close to the boundary of the disk will have a relatively large distance to all other points and are well suited to represent leaf nodes. Due to these properties, Poincaré maps allows us to represent in as few as two dimensions the developmental processes where cells undergo a continuous differentiation into more specialized types.
Many complex biological processes are hierarchical in nature. Recent advances in single-cell RNA-sequencing technologies have enabled researchers to collect quantitative data on these biological processes, such as whole organism atlases for Planaria and C. elegans, which are commonly used in biological research. Likewise, the ongoing Human Cell Atlas project seeks to create a comprehensive map of all human cells. Using datasets such as these, Poincaré maps can help researchers better represent and analyze cellular development.
For example, when analyzing C. elegans, Poincaré maps produced a comprehensive two-dimensional representation of the cellular development of a highly complex organism using a completely unsupervised approach. This representation not only agrees with common assumptions about the hierarchy of cell development but also produces pseudo time estimates that closely align with the actual age of the cells. Other researchers have recently followed up on and extended our idea to use hyperbolic geometry to analyze single-cell data, exploring generative models and applications such as batch-effects correction. We are excited to see how biologists will use these new methods and tools, and we invite others in this field to review our paper and code.
More generally, hyperbolic representations are an active area of research in machine learning and artificial intelligence. Following our initial work, researchers from Facebook AI and other institutions have explored their applications for graph neural networks (for example, here, and here ), generative models, learning semantics from text corpora (for example, here and here) and the analysis of sociological data, amongst other areas.