Using long-read CAGE sequencing to profile cryptic-promoter-derived transcripts and their contribution to the immunopeptidome

  1. Ting Wang1,2,4
  1. 1Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
  2. 2Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
  3. 3Donald Danforth Plant Science Center, St. Louis, Missouri 63132, USA;
  4. 4McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63108, USA
  • 5 Present address: Department of Epigenetics, Van Andel Institute, Grand Rapids, MI 49503, USA

  • Corresponding author: twang{at}wustl.edu
  • Abstract

    Recent studies have shown that the noncoding genome can produce unannotated proteins as antigens that induce immune response. One major source of this activity is the aberrant epigenetic reactivation of transposable elements (TEs). In tumors, TEs often provide cryptic or alternate promoters, which can generate transcripts that encode tumor-specific unannotated proteins. Thus, TE-derived transcripts (TE transcripts) have the potential to produce tumor-specific, but recurrent, antigens shared among many tumors. Identification of TE-derived tumor antigens holds the promise to improve cancer immunotherapy approaches; however, current genomics and computational tools are not optimized for their detection. Here we combined CAGE technology with full-length long-read transcriptome sequencing (long-read CAGE, or LRCAGE) and developed a suite of computational tools to significantly improve immunopeptidome detection by incorporating TE and other tumor transcripts into the proteome database. By applying our methods to human lung cancer cell line H1299 data, we show that long-read technology significantly improves mapping of promoters with low mappability scores and that LRCAGE guarantees accurate construction of uncharacterized 5′ transcript structure. Augmenting a reference proteome database with newly characterized transcripts enabled us to detect noncanonical antigens from HLA-pulldown LC-MS/MS data. Lastly, we show that epigenetic treatment increased the number of noncanonical antigens, particularly those encoded by TE transcripts, which might expand the pool of targetable antigens for cancers with low mutational burden.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.277061.122.

    • Freely available online through the Genome Research Open Access option.

    • Received February 19, 2023.
    • Accepted November 13, 2023.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server