What Does The Tophat File Named 'Accepted_Hits.Bam' Include?
1
2
Entering edit mode
8.9 years ago
narges ▴ 180

Hi all,

I wanted to ask about accepted_hits.bam file from TopHat. Is it correct that it does not contain all the valid alignments?

If it is true then I can conclude that the input fasta file should have some alignments which are not include in this file (maybe the ones which their associated reads are not unique are excluded(?) ). So, if I need all the alignments of my fasta file not only the ones in accepted_hits.bam file should I merge any TopHat output files?

Any suggestion would be appreciated.

tophat • 7.7k views
0
Entering edit mode

What makes you think it doesn't contain all valid alignments?

0
Entering edit mode

Because I have used it as the input for BitSeq R package and I got an error that the number of alignments are different to its fasta file. (As BitSeq get the BAM file and its related fasta file together as the input data) but I am not still sure whether this is the problem or not.

4
Entering edit mode
8.9 years ago

Fastq files do not contain alignments but rather raw read sequences to be aligned. Therefore you may have reads in your fastq that are not represented in acceptedhits.bam because those reads do not align. You will also have reads in your fastq that correspond to more than one alignment in the acceptedhits.bam file in cases where the placement of a read sequence in your genome is ambiguous. i.e. it matches equally well to multiple places. By default, in such cases, TopHat allows up to 20 'multi-hits'. You can control this behavior with the option '-g/--max-multihits <int>'. From the docs:

-g/--max-multihits

Instructs TopHat to allow up to this many alignments to the reference for a given read, and choose the alignments based on their alignment scores if there are more than this number. The default is 20 for read mapping. Unless you use --report-secondary-alignments, TopHat will report the alignments with the best alignment score. If there are more alignments with the same score than this number, TopHat will randomly report only this many alignments. In case of using --report-secondary-alignments, TopHat will try to report alignments up to this option value, and TopHat may randomly output some of the alignments with the same score to meet this number.

0
Entering edit mode

Thanks it is exactly what I wanted to know.

0
Entering edit mode

Do the files unmapped.bam plus accepted_hits.bam contain most of the information that the original fastq file contained? If not, what information is missing?