CAGE processing (from BAM to expression values per Gene/Isoform)
Entering edit mode
7.2 years ago
Tobias ▴ 150

Currently, I am trying to analyze some CAGE data from the FANTOM5 consortium ( and for the data

Unfortunately, I am not that experienced in analyzing CAGE data. They do provide some processed data for both the TSS expression and enhancers and some processing software. However, I want reprocess these data myself and to start at least with the BAM files. So given CAGE BAM files, what would you suggest to use to get out both a sensible value of gene expression for each TSS (and Isoform) and the same for the enhancers?
For the former, usually I would simply take cuffdiff with some GENCODE annotation, but I am unsure whether this is just sensible for RNA-Seq and not for CAGE - or what would you use for it? And I still have no idea how to do it similarly with enhancers (under the assumption that I am having their annotation) - should I count the number of tags falling into these regions?. Also do you know how the "activity" of an enhancer is related to eRNA abundance.

Generally speaking, should I consider the mRNA abundance for a gene measured with CAGE in FANTOM5 as the level of transcription initiation or the the steady-state mRNA level after posttranscriptional regulation (by miRNAs)? Also are other RNAs like miRNAs or lncRNAs included?

It would be great if you can help me here.

RNA-Seq next-gen Assembly sequencing CAGE • 2.6k views
Entering edit mode
7.2 years ago
Chirag Nepal ★ 2.3k

CAGE detects promoters of coding and lncRNA, miRNA promoters, post-transcriptionally processed RNAs, eRNA and so on.

  1. If you want to process CAGE BAM files yourself, here is the tool:

  2. CAGE detects post-transcriptionally processed RNA signals spread across exons, and they have very different characteristic features, here is the paper:

  3. CAGE detects miRNA promoters along with Drosha processing events, described here:

In general, CAGE tags detected at the promoter is mRNA abundance, that is used to quantify gene expression level. CAGE tags at the promoters show nice correlation with RNA-seq. You can also read the Fantom5 papers itself.

Entering edit mode

Many thanks for your answer. I am still wondering, however, whether or not I can get out the isoform expression levels from the CAGE data. That would be very helpful to me! However, for many different isoforms (in GENCODE), we have the same TSS, so (how) is it possible to separate between these different isoforms?


Login before adding your answer.

Traffic: 1558 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6