Detection of Non-Melanoma Skin Cancer
We assess the effectiveness of general-purpose pathology foundation models (FM) for the diagnosis and annotation of nonmelanoma skin cancer (NMSC) in resource-limited settings. We evaluated three pathology Foundation Models using deidentified NMSC histology images from the Bangladesh Vitamin E and Selenium Trial to predict cancer subtype based on zero-shot whole-slide embeddings. Our study highlights the important role FMs may play in confronting public health challenges and exhibits a real-world potential for machine learning–aided cancer diagnosis.
Ellis, S., Song, S., Reiman, D., Hui, X., Zhang, R., Shahriar, M. H., … & Ahsan, H. (2025). Improved Diagnosis of Non-Melanoma Skin Cancer in Resource-Limited Settings. Cancer Epidemiology, Biomarkers & Prevention.
Multimodal TCGA Survival Model
Here, we investigate the feasibility of training classical, multimodal survival models over zero-shot embeddings extracted by Foundation Models. We show the ease and additive effect of multimodal fusion, outperforming unimodal models. We demonstrate the benefit of including pathology report text and rigorously evaluate the effect of model-based text summarization and hallucination. Overall, we modernize survival modeling by leveraging FMs and information extraction from pathology reports
Song, S., Borjigin-Wang, M., Madejski, I., & Grossman, R. L. (2025). Multimodal Survival Modeling in the Age of Foundation Models. arXiv [Cs.LG]. Retrieved from http://arxiv.org/abs/2505.07683
LaB-RAG Radiology Report Generation Model
Here we propose Label Boosted Retrieval Augmented Generation (LaB-RAG), a text-based approach to image captioning that leverages image descriptors in the form of categorical labels to boost standard retrieval augmented generation (RAG) with pretrained large language models (LLMs). We study our method in the context of radiology report generation (RRG), where the task is to generate a clinician’s report detailing their observations from a set of radiological images, such as X-rays.
Song, S., Subramanyam, A., Madejski, I., & Grossman, R. L. (2024). LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation. arXiv preprint arXiv:2411.16523.
AI Data Curation Models and Tools
Data Model Curator – The overall goal of this project is to create AI tools to assist with data curation and harmonization, which are common bottlenecks in data sharing projects. This particular model is a fined-tuned LoRA layer based on Llama-3.1-8B that is optimized for the specific task of generating structured data models in JSON format from a dump of tabular data files
- Hugging Face collection
- Generation of Synthetic Data Models and Contributions
- Creation of Serialized File From AI Model Output