Question: Understanding FANTOM5 CAGE fields
gravatar for simplitia
23 months ago by
simplitia40 wrote:

Hi recently I downloaded data from FANTOM5 Phase 2.0 ( ) with the goal of figuring out TSS sites, however I cannot seem to find any documentations on this. For example, I downloaded FANTOM5 Phase 2. I'm a bit confused, enter image description here

So for example FANTOM site has many human sample CAGE data from different cell lines and tissues however this file seem to suggest that it is combined from phase 1 and phase2, so does this mean that all the data were somehow average and these are the peaks of the averages? Also why do each row seem to base of of different transcripts?

thanks in advance.

tss rna-seq transcription • 606 views
ADD COMMENTlink modified 23 months ago by kristoffer.vittingseerup3.5k • written 23 months ago by simplitia40
gravatar for kristoffer.vittingseerup
23 months ago by
European Union
kristoffer.vittingseerup3.5k wrote:

Q1) That it is combined from phase 1 and 2 simply mean that it is peak calls based on both the data set from this article (phase 1, stationary) and this article (phase 2, dynamic). This does not mean the peaks are average but that all the data is pooled before the peaks are called (and afterwards quantified in the individual samples).

Q2) Each row is not a different transcript but a different transcription start site (TSS, with an associated id with the form pX@geneName and having the genomic coordinates indicated in col 1) - TSS detection that is what CAGE (the method we used) is good for. For each TSS we have also annotated how that overlaps with known transcripts (that is what you see in column 4).

Hope this answers your question.

ADD COMMENTlink written 23 months ago by kristoffer.vittingseerup3.5k

great thanks that is super helpful.

Here is couple of followup questions. 1. So using that example above, p1@LINC200277 is about 29 bp in width. So does that mean the TSS for this gene can be within any of the 29 bp in this range?

  1. Moreover if its range how is it only a single digit 0bp_to_ENST00xx (column 4)?
ADD REPLYlink modified 23 months ago • written 23 months ago by simplitia40

1) That mean we found evidence of transcription start sites at all those positions. 2) To make it easier to you we take the peak (the position with the most TSS signal) and used that to calculate distances.

ADD REPLYlink written 23 months ago by kristoffer.vittingseerup3.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1132 users visited in the last hour