Tophat Junctions.Bed File
1
1
Entering edit mode
12.0 years ago
Varun Gupta ★ 1.3k

Hi Everyone

I am working on RNA- seq data. I have a fastq file(single end). After running TOPHAT on the fastq file I am interested in junctions.bed file which is produced. Basically I am interested in reads that start at a particular position , spans the junction and ends at different position. I look my sam/bam file in igv browser and load the junctions.bed file in the browwser. When I take my mouse cursor to junctions graph in the igv browser, I can see the junctions coordinates. For example see figure below:

http://www.freeimagehosting.net/qkidz

Now If you see in the image we can see that for rpl3 in one case depth is 872 and coordinates are 1030-2032 with flanking width as (40,67), whereas in other case for rpl3 depth is 16 and coordinates are 1030-2058 with flanking width as (20,63).

Now when I look at the junctions.bed file for rpl3 gene I should get these 2 coordinates as above but instead I get these coordinates as below.

rpl3    990     2099    JUNC00001560    872     +       990     2099    255,0,0 2       40,67   0,1042
rpl3    1010    2121    JUNC00001561    16      +       1010    2121    255,0,0 2       20,63   0,1048

So you see start coordinate is 990 and end coordinate is 2099 but igv shows start coordinate to be 1030 and end coordinate 2032 for the depth 872. Similarly for the other case as well.

So how can I get the real coordinates as I see them in IGV

Hope to hear from you soon

Regards

tophat • 10k views
ADD COMMENT
5
Entering edit mode
11.3 years ago
Kanne ▴ 450

I don't know how long ago you posted this but here's my answer anyhow.

As you mentioned, the coordinates from IGV are:

left: 1030; right: 2032

And the junctions.bed file contains:

rpl3 990 2099 JUNC00001560 872 + 990 2099 255,0,0 2 40,67 0,1042

The coordinates on the juncs file are the true coordinates of the junction, after they have been adjusted by the 'maximal_overhang' size (i.e. the two numbers in column 10, 40,67).

When you look at the junctions.bed file track in IGV, these 'overhang' are the little 'feet' on either end of the red arches. Simplified, they represent the reads left and right portions of the reads which spanned the junction.

So, to calculate the true coordinates from the junctions.bed file:

bedfile left coord (990) + left maximal overhang (40) = True left position of junction (1030)

bedfile right coord (2099) - right maximal overhang (67) = True right position of junction (2032).

So this coordinate in the bed file is the tip of each end of the 'feet' (ie. overhang) in the IGV graphic, and if you adjust these by the length of the feet, you get the junction coordinates!

Hope this helps.

ADD COMMENT
0
Entering edit mode

Hi @Kanne, in this case, the left exon should be end with 1030 and the right exon start with 2033, is it?

ADD REPLY

Login before adding your answer.

Traffic: 1656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6