Utah State University Researchers Release RF-PHATE, a Supervised Tool for Visualizing Complex Biological Data
The paper, written by Kevin Moon, director of USU’s Data Science and Artificial Intelligence (DSAI) Center, and a team of collaborators from USU and partner institutions, introduces a technique that preserves the structure of high‑dimensional biological datasets while allowing expert labels to steer the embedding. Unlike conventional unsupervised methods such as PHATE, t‑SNE, and UMAP, RF‑PHATE injects a random‑forest proximity matrix that incorporates supervised information, according to the authors. The result is a reduced tendency for the algorithm to over‑emphasize differences between groups and an improved representation of relationships among them.
The authors demonstrated the method on several biomedical datasets. The flagship example involved clinical data from patients with multiple sclerosis (MS). The dataset contained hundreds of thousands of cellular‑level measurements, treatment records, and clinical outcomes. RF‑PHATE revealed a distinct cluster that aligns with a previously suspected MS subtype, a finding that could inform personalized treatment strategies. Additional analyses applied the method to COVID‑19 patient plasma data and antioxidant‑treated lung‑cancer cell data, showing that RF‑PHATE can uncover biologically meaningful patterns across diverse contexts.
The publication lists a broad set of collaborators. Lead author Jake Rhodes, a USU alumnus and assistant professor at Brigham Young University, contributed to the statistical development of the method. Other co‑authors include Adele Cutler, professor emerita of USU’s Department of Mathematics and Statistics; Anhong Zhou, professor in USU’s Department of Biological and Chemical Engineering; and Wei Zhang, a USU alumnus and researcher at the University of Utah. The research was supported by the National Institutes of Health and the IVADO Visiting Scholar Program.
In addition to the USU team, the paper acknowledges contributions from researchers at Université de Montréal, the Mila‑Québec AI Institute, Centre Hospitalier de l’Université de Montréal, the University of California, San Francisco, Charles LeMoyne Hospital, the University of Lausanne, and McGill University.
Moon emphasized that RF‑PHATE is not limited to biological data. "The method can be applied to many other disciplines and can also be used to develop more interpretable AI models, as well as to analyze the models themselves," he said. The authors released an open‑source implementation on GitHub, allowing other researchers to apply the technique to their own datasets.
The DSAI Center promotes the “AI for Science” movement, which encourages the use of machine learning to accelerate research and analyze large datasets. Moon encouraged students at all levels to explore interdisciplinary opportunities facilitated by the center.
While the paper presents promising results, the authors note that further validation and refinement are needed before RF‑PHATE can be adopted in clinical workflows. The open‑source release invites the broader community to test the method on additional datasets and to contribute improvements.
At present, RF‑PHATE remains a research tool rather than a commercial product. The authors plan to continue exploring its applicability to other high‑dimensional data domains and to investigate how the supervised embedding can aid in the interpretability of complex AI models.
The release of RF‑PHATE adds a new option to the growing toolkit of dimensionality‑reduction techniques and highlights the ongoing collaboration between USU and a network of academic partners in advancing machine‑learning methods for scientific discovery.