Question: Ambiguous fields in FANTOM 5 Enhancer_TSS_association.bed file
0
gravatar for rohitsatyam102
8 months ago by
rohitsatyam102190 wrote:

Hi all

I downloaded a file from enhancer database (Slidebase) named "enhancer tss associations". However, I am facing a problem in identifying what does coordinates in the first three columns of this bed file represents. I am aware that the fourth column containing entries like "chr1:167440766-167441089; NM_052862; RCSD1; R:0.319; FDR:0" means enhancer coordinates,Transcript accession number,gene symbol,some_score, and False Discovery rate. I am not sure what kind of score does R score represents. I went through the paper of andersson et al to which the website points. However, I couldn't find anything. Also, the last two columns don't make sense to me.

rna-seq genome gene • 238 views
ADD COMMENTlink modified 8 months ago by Corentin450 • written 8 months ago by rohitsatyam102190
2
gravatar for Corentin
8 months ago by
Corentin450
Corentin450 wrote:

Hi,

The file is in the BED12 format: http://genome.ucsc.edu/FAQ/FAQformat.html#format1 . This format is used to display tracks on a Genome Browser.

The last two columns represents where blocks are drawn on the Genome Browser. In my understanding, one block represents the enhancer and the other the TSS. One of the column represents the length of each block and the other column represents the start of each block (compared to the position on the chromosome, the second column).

You can see an example of the two blocks here (notice how the line name correspond to the 4th column of your file):

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr1%3A858252%2D861621&hgsid=791359097_TnfoVJubF5SaAM0recpdIpTpvsGI

The R score (calculated as a Pearson Correlation Score) represents the strength of the association between an enhancer and a tss site, if it is higher, then the association is stronger. As you can see, the higher the R score is, the higher the "score column" is (this is because, the "score column" is used to draw the blocks in different shades of grey).

For more information you can also read the FANTOM5 paper: https://www.nature.com/articles/nature12787

ADD COMMENTlink written 8 months ago by Corentin450

Thanks, Corentin for your explanation. However, it is still unclear to me what does first three columns represent in the bed file. They aren't the coordinates for the enhancers that I am sure of. I wish to understand what the start and end coordinates refer to in this case. They aren't TSS either.

ADD REPLYlink written 8 months ago by rohitsatyam102190

I did some testing on the UCSC genome browser and it seems that the first three columns correspond to the whole feature (the enhancer + TSS). It is probably to make the genome browser display everything.

The coordinates does not exactly match the features (it seems to start before and end after the actual enhancer and TSS, which is probably to make the view better?).

But since I have not found a documentation for it, I am not 100% sure. Let us know if you manage to find an answer.

ADD REPLYlink written 7 months ago by Corentin450
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1141 users visited in the last hour