Pan-cancer analysis aims to examine the similarities and differences among the genomic and cellular alterations found across diverse tumor types.[1] [2] International efforts have performed pan-cancer analysis on exomes and the whole genomes of cancers, the latter including their non-coding regions. In 2018, The Cancer Genome Atlas (TCGA) Research Network used exome, transcriptome, and DNA methylome data to develop an integrated picture of commonalities, differences, and emergent themes across tumor types.
In 2020, the International Cancer Genome Consortium (ICGC)/TCGA Pan-Cancer Analysis of Whole Genomes project published a set of 24 papers analyzing whole cancer genomes and transcriptomic data from 38 tumor types. A comprehensive overview of the project is provided in its flagship paper.[3]
Another project, pan-cancer analysis of RNA-binding proteins (RBPs) across human cancers,[4] explored the expression, somatic copy number alteration, and mutation profiles of 1,542 RBPs in ~7,000 clinical specimens across 15 cancer types. This study characterized the oncogenic properties of six RBPs—NSUN6, ZC3H13, BYSL, ELAC1, RBMS3, and ZGPAT—in colorectal and liver cancer cell lines.
Several studies have found a causal, predictable connection between genomic alterations (single-nucleotide variants or large copy number variants) and gene expression across all tumor types. This pan-cancer relationship between genomic status and transcriptomic quantitative data can predict a specific genomic alteration from gene expression profiles alone;[5] it can also be used as the basis for machine learning approaches.
Pan-cancer studies aim to detect the genes whose mutation is conducive to oncogenesis, as well as recurrent genomic events or aberrations between different tumors. For these studies, it is necessary to standardize the data between multiple platforms, establishing criteria between different researchers to work on the data and present the results. Omics data allow the rapid identification and quantification of thousands of molecules in a single experiment. Genomics addresses the potential that certain genes will be expressed, proteomics addresses what genes are in fact being expressed, and metabolomics addresses what has happened in the tissue being studied. The combination of all of them gives information about the biological system.
Pan-cancer Whole-Genome Comparison of Primary and Metastatic Solid Tumours is a comprehensive research study published in Nature exploring genomic disparities between untreated early-stage primary tumors and treated late-stage metastatic tumors. Conducted through a harmonized analysis of 7,108 whole-genome-sequenced tumors across 23 cancer types, the study aimed to understand the impact of genomic changes on disease progression and therapy resistance.[6]
Metastatic tumors exhibited lower intratumor heterogeneity and conserved karyotypes, displaying modest increases in mutations but elevated frequencies of structural variants. The study highlighted the variable contributions of mutational footprints and identified specific genomic differences between primary and metastatic stages across various cancer types.
The study demonstrated substantial genomic differences between primary and metastatic tumors across multiple cancer types. However, these differences varied considerably among cancers, influencing the genomic landscape and potential therapeutic responses. Further research and larger datasets are necessary to comprehend the complexities of tumor evolution, metastasis, and therapy resistance comprehensively.
The findings offer valuable insights into tumor progression and therapy resistance mechanisms, laying the groundwork for potential personalized treatment strategies across various cancers.
The nearly 800 terabytes of data from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes project have been made available through various portals and repositories, including those at the Ontario Institute for Cancer Research, the European Molecular Biology Laboratory's European Bioinformatics Institute, and the National Center for Biotechnology Information. All data obtained from the TCGA efforts are available at the US National Cancer Institute's TARGET Data Matrix and the web portal ProteinPaint.[7]
StarBase pan-cancer resources[8] were created for the networks of long noncoding RNAs, microRNAs, competing endogenous RNAs and RBPs.