Question: Tophat Junctions.Bed File
1
gravatar for Varun Gupta
6.9 years ago by
Varun Gupta1.1k
United States
Varun Gupta1.1k wrote:

Hi Everyone

I am working on RNA- seq data. I have a fastq file(single end). After running TOPHAT on the fastq file i am interested in junctions.bed file which is produced. Basically i am interested in reads that start at a particular position , spans the junction and ends at different position. I look my sam/bam file in igv browser and load the junctions.bed file in the browwser. When I take my mouse cursor to junctions graph in the igv browser, i can see the junctions coordinates. For example see figure below:

http://www.freeimagehosting.net/qkidz

Now If you see in the image we can see that for rpl3 in one case depth is 872 and coordinates are 1030-2032 with flanking width as (40,67), whereas in other case for rpl3 depth is 16 and coordinates are 1030-2058 with flanking width as (20,63).

Now when i look at the junctions.bed file for rpl3 gene i should get these 2 coordinates as above but instead i get these coordinates as below.

rpl3    990     2099    JUNC00001560    872     +       990     2099    255,0,0 2       40,67   0,1042
rpl3    1010    2121    JUNC00001561    16      +       1010    2121    255,0,0 2       20,63   0,1048

So you see start coordinate is 990 and end coordinate is 2099 but igv shows start coordinate to be 1030 and end coordinate 2032 for the depth 872. Similarly for the other case as well.

So how can i get the real coordinates as i see them in IGV

Hope to hear from you soon

Regards

tophat file • 7.3k views
ADD COMMENTlink modified 6.2 years ago by Kanne400 • written 6.9 years ago by Varun Gupta1.1k
5
gravatar for Kanne
6.2 years ago by
Kanne400
Australia
Kanne400 wrote:

I don't know how long ago you posted this but here's my answer anyhow.

As you mentioned, the coordinates from IGV are:

left: 1030; right: 2032

And the junctions.bed file contains:

rpl3 990 2099 JUNC00001560 872 + 990 2099 255,0,0 2 40,67 0,1042

The coordinates on the juncs file are the true coordinates of the junction, after they have been adjusted by the 'maximal_overhang' size (i.e. the two numbers in column 10, 40,67).

When you look at the junctions.bed file track in IGV, these 'overhang' are the little 'feet' on either end of the red arches. Simplified, they represent the reads left and right portions of the reads which spanned the junction.

So, to calculate the true coordinates from the junctions.bed file:

bedfile left coord (990) + left maximal overhang (40) = True left position of junction (1030)

bedfile right coord (2099) - right maximal overhang (67) = True right position of junction (2032).

So this coordinate in the bed file is the tip of each end of the 'feet' (ie. overhang) in the IGV graphic, and if you adjust these by the length of the feet, you get the junction coordinates!

Hope this helps.

ADD COMMENTlink written 6.2 years ago by Kanne400

Hi @Kanne, in this case, the left exon should be end with 1030 and the right exon start with 2033, is it?

ADD REPLYlink written 3.9 years ago by pengchy410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2336 users visited in the last hour