Skip to content
Home » Memorial Sloan Kettering Cancer Center Research Shows How Realyze Intelligence Solves the Scalability Problem of Manual Curation

Memorial Sloan Kettering Cancer Center Research Shows How Realyze Intelligence Solves the Scalability Problem of Manual Curation


A poster, presented today, Thursday, April 9, 2024, at the American Association for Cancer Research Annual Meeting, (AACR Annual Meeting) demonstrated that researchers at Memorial Sloan Kettering Cancer Center (MSKCC) successfully applied machine learning (ML) and large language models (LLM) to augment the manual curation of cancer data elements using the Realyze Intelligence platform. 

MSKCC, one of the leading cancer centers in the United States, is pioneering precision medicine for cancer patients. Currently, more than 100,000 cancer patients are participating in genomic testing through MSKCC, but clinician researchers still face major hurdles. Researchers struggle to synthesize structured genomic data with unstructured data that is locked within clinician notes, pathology and radiation reports, patient history information, and referral records.  

The current solution to this accessibility issue requires an outside vendor to manually curate cancer patients’ core clinical data elements (CCDE) which include 122 data elements that make up a patient’s full cancer history. Manual curation through chart review takes one working day per patient and creates a research bottleneck, leaving many large institutions multiple years behind in their manual curation efforts. Manual curation cannot be scaled and is an unsustainable solution in the rapidly evolving world of precision oncology. 

To find a scalable solution, MSKCC worked with Realyze Intelligence to use their artificial intelligence (AI) technology to systematically curate a patient dataset and test the results for accuracy against a manually curated dataset.  Researchers used Realyze Intelligence to extract CCDE data elements from patient records including histology, pathology site, Mismatch Repair (MMR) test results, TNM staging, and Eastern Cooperative Oncology Group (ECOG), and Karnofsky Performance Status (KPS) scores for a pilot lung cancer cohort of 150 patients. Then the team manually validated the generated data for 74 out of the 150 patients.  

The results validated that the Realyze Intelligence platform is poised to efficiently and effectively augment the current manual curation process enabling more up to date datasets and accelerating oncology clinical research. 

The Realyze Intelligence platform delivered the following results: 

  • Concordant values for MMR, KPS and TNM staging for 100% of the instances. 
  • For MMR these were all null values with false negative (FN) of 100% accuracy. 
  • Pathology site had 92.15% accuracy, while histology showed 97.5% accuracy. 

The poster titled “Machine learning and large language model approach to pancancer data elements” was presented at the AACR Annual Meeting during a session on Artificial Intelligence and Data Science on Real World Data. The AACR Annual Meeting highlights the best cancer science and medicine from institutions around the world. 

Work is already underway to refine the model and the team at MSKCC plan to run the same analysis on a larger cohort of cancer patients to calculate the accuracy. Results will be presented later this year. 

One of MSKCC’s key missions is to change the way the world treats cancer through research; this partnership with Realyze Intelligence is contributing to making that goal a reality. 

To learn more, read the poster summary here and reach out to the Realyze Intelligence team here