Question: What Does The Tophat File Named 'Accepted_Hits.Bam' Include?
2
gravatar for narges
6.6 years ago by
narges180
Finland
narges180 wrote:

Hi all,

I wanted to ask about accepted_hits.bam file from TopHat. Is it correct that it does not contain all the valid alignments?

If it is true then I can conclude that the input fasta file should have some alignments which are not include in this file (maybe the ones which their associated reads are not unique are excluded(?) ). So, if I need all the alignments of my fasta file not only the ones in accepted_hits.bam file should I merge any TopHat output files?

Any suggestion would be appreciated.

tophat • 6.4k views
ADD COMMENTlink modified 6.6 years ago by Malachi Griffith17k • written 6.6 years ago by narges180

What makes you think it doesn't contain all valid alignments?

ADD REPLYlink written 6.6 years ago by Mikael Huss4.6k

Because I have used it as the input for BitSeq R package and I got an error that the number of alignments are different to its fasta file. (As BitSeq get the BAM file and its related fasta file together as the input data) but I am not still sure whether this is the problem or not.

ADD REPLYlink written 6.6 years ago by narges180
4
gravatar for Malachi Griffith
6.6 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

Fastq files do not contain alignments but rather raw read sequences to be aligned. Therefore you may have reads in your fastq that are not represented in acceptedhits.bam because those reads do not align. You will also have reads in your fastq that correspond to more than one alignment in the acceptedhits.bam file in cases where the placement of a read sequence in your genome is ambiguous. i.e. it matches equally well to multiple places. By default, in such cases, TopHat allows up to 20 'multi-hits'. You can control this behavior with the option '-g/--max-multihits <int>'. From the docs:

-g/--max-multihits

Instructs TopHat to allow up to this many alignments to the reference for a given read, and choose the alignments based on their alignment scores if there are more than this number. The default is 20 for read mapping. Unless you use --report-secondary-alignments, TopHat will report the alignments with the best alignment score. If there are more alignments with the same score than this number, TopHat will randomly report only this many alignments. In case of using --report-secondary-alignments, TopHat will try to report alignments up to this option value, and TopHat may randomly output some of the alignments with the same score to meet this number.

ADD COMMENTlink written 6.6 years ago by Malachi Griffith17k

Thanks it is exactly what I wanted to know.

ADD REPLYlink written 6.6 years ago by narges180

Do the files unmapped.bam plus accepted_hits.bam contain most of the information that the original fastq file contained? If not, what information is missing?

ADD REPLYlink written 3.2 years ago by nemko0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 783 users visited in the last hour