Question

.Bam Format Obtained From Same Samples Different Sizes

0

Entering edit mode

12.5 years ago

catarina.fa • 0

I'm trying to align three sequences .fastq with top hat and he does align them. Problem is I know the size of the bamfile I'm supposed to obtain, given by my supervisor, and I don't seem to be able to get it. Does anyone have any suggestions?

bam • 2.7k views

ADD COMMENT • link updated 12.5 years ago by matted 7.8k • written 12.5 years ago by catarina.fa • 0

1

Entering edit mode

You should give some more information. But... if you use the same tophat version, call it using the same parameters, use the same reference genome and the same input file, you should end up with the same output file like you supervisor.

ADD REPLY • link 12.5 years ago by David Langenberger 11k

score 1 · Answer 1 · 2013-01-06

Comparing file sizes of BAMs is a very brittle way to check results.

One problem that will occur even if everything else is perfectly matched is that Tophat is non-deterministic. With -g 1, multi-mapping reads will report a random alignment choice. From the documentation for -g:

If there are more alignments with the same score than this number, TopHat will randomly report only this many alignments.

And since BAM is a compressed format, these minor differences will affect how well the file compresses, and therefore the final file size.

score 0 · Answer 2 · 2013-01-05

0

Entering edit mode

12.5 years ago

catarina.fa • 0

Thanks for answering! And you're right, I gave very little info. So, I'm trying to align 3 control samples, SRR035585.fastq,SRR035586.fastq,SRR035587.fastq. (I work in command line, and I have very little experience, the project just started.) The last way I tried to properly align them (and properly means getting the same bam files as my supervisor) was by moving the samples to a folder called "Ctrl", in my user area. Then I ran in this folder the command:

tophat2 -g 1 /GenoStorage/Genomas/hg19/Genome_indexFiles/Bowtie2/hg19 SRR035585.fastq,SRR035586.fastq,SRR035587.fastq

(being /GenoStorage/Genomas/hg19/Genome_indexFiles/Bowtie2/hg19 the path to TopHat). I know the path is correct, and we're using the same TopHat version and the same files...

ADD COMMENT • link 12.5 years ago by catarina.fa • 0

1

Entering edit mode

What is the difference you get? Just different file size? How does the amount of mapping loci differ? Do you have a different number of mappable reads? If there are reads your supervisor was able to map, but you missed them.... take one read, write it in a separate fastq-file, give it to you supervisor and aks him if he can run tophat on his machine with absolutely the same parameters and indexes you used and see, if he can map it. If he uses different parameters to call tophat.... use his set of parameters.