Junctions.Bed File Produced From Tophat
1
1
Entering edit mode
9.7 years ago
Varun Gupta ★ 1.2k

Hi everyone

I am using Tophat for my analysis on RNA seq data After I ran Tophat on my reference geneome(which is basically for 80 ribosomal protein genes only) ,I got accepted_hits.bam and junctions.bed files as output in which i am interested.

I ran my tophat command on single end data Below is the code i used for it. First of all I align it and build index with bowtie using bowtie-build command

bowtie-build genomic_seq.fa GenomicRPGenome


(genomic_seq.fa is my fasta files for 80 ribosomal protein genes) Then i ran tophat command

tophat /Genomic_bowtie_index_files/GenomicRPGenome read1.fastq


In accepted_hits.bam , the value of mapping quality(column 5) is 255 which means that mapping quality score is not available. So does this mean that i did something wrong??

A junction file is also produced. The result for the junction file is not in accordance with the known splice sites. I don't know why this weird result came into the junctions.bed file. It would be very nice if someone could help me out in this. Am i missing something or doing something wrong. For the time being i am only dealing with single end rnaseq data

Regards Varun

tophat • 4.6k views
1
Entering edit mode

What do you mean by, "result for the junction file is not in accordance with the known splice sites." Can you provide a slice of your junctions.bed file for us to look at?

0
Entering edit mode

Hi So i will explain you more what i think about the junctions.bed file and then you can tell me the solution to my query. According to my understanding column2nd and column 3rd in junctions.bed file produced by tophat are basically the coordinates of begining of intron and end of the intron respectively(Am i correct on this). If so when i looked at the junctions.bed output for a particular gene say rpl3 and compared it with the genomic sequence of rpl3 , i saw the output of the junctions.bed file is not at all related to begining and end of the intron where it should be supposed to be placed.

0
Entering edit mode

BTW junction file which is not related to coordinates rpl3 990 2099 JUNC00001560 872 rpl3 1010 2121 JUNC00001561 16 rpl3 1024 2127 JUNC00001562 2 rpl3 1167 2089 JUNC00001563 15 rpl3 1206 2104 JUNC00001564 1 rpl3 2032 2225 JUNC00001565 2 rpl3 2153 3066 JUNC00001566 12172 rpl3 3123 3662 JUNC00001567 8 rpl3 3093 3854 JUNC00001568 7966 rpl3 3848 5139 JUNC00001569 4425 rpl3 5186 5850 JUNC00001570 4712 rpl3 5867 6486 JUNC00001571 11699 rpl3 6446 6967 JUNC00001572 11518 rpl3 6478 7379 JUNC00001573 20 rpl3 6919 7386 JUNC00001574 13776 rpl3 6955 7344 JUNC00001575 3 rpl3 7363 7712 JUNC00001576 7722

0
Entering edit mode

Dont know how to show you a section of junctions.bed file here?? :( Can we paste it it has word limit??

2
Entering edit mode
9.7 years ago

Tophat gives a mapping quality score of 255 to all reads which map uniquely to the genome, regardless of how unique the read actually is. This is nothing to worry about. For example, in the following IGV screenshot, ever read has a mapping quality of 255.