Question: BAM to FASTQ picard or samtools
1
gravatar for anoops
10 weeks ago by
anoops10
anoops10 wrote:

Hello,

I am trying to convert a batch of BAM files to FASTQs. I started out testing SAMTOOLS (collate/bam2fq) and PICARD (SAMTOFATQ). On the outset the numbers seemed OK but the statistics suggests that the SAMTOOLS out has twice the amount of duplicates as the Picard out.

Has anyone experienced this? I am not sure if it is a samtools problem or I am not comprehending the QC stats.

Any advice/recommendation/comments are welcome.

Thanks!

PS: In both cases I am outputting both first end of the pair and the second end of the pair as separate files.

UPDATED: The commands used were:

Samtools

samtools collate -o name-collate.bam sample.bam
samtools fastq -1 sample_1.fastq.gz -2 sample_2.fastq.gz -0 sample_0.fastq.gz name-collate.bam

Picard

java -Xmx2g -jar picard.jar SamToFastq I=sample.bam FASTQ=sample_1p.fastq.gz SECOND_END_FASTQ=sample_2p.fastq.gz UNPAIRED_FASTQ=sample_0p.fastq.gz

Fasqc check

fastqc -o fastqc_out/ sample_1p.fastq.gz

Picard QC

Picard

Samtools QC

Samtools

sequencing next-gen assembly • 388 views
ADD COMMENTlink modified 10 weeks ago by h.mon20k • written 10 weeks ago by anoops10
1

It would be a big help if you could provide the command lines used

ADD REPLYlink written 10 weeks ago by swbarnes24.2k

I didn't include them because they were default. They are included now. Thanks in advance!!!

ADD REPLYlink written 10 weeks ago by anoops10

FYI, you do not need collate. A simple sort by name with the -n option of samtools sort will "restore" the read order as it was obtained from the sequencer, so pretty much random. This you can directly pipe into samtools fastq:

samtools sort -n in.bam | samtools fastq -1 sample_1.fastq.gz -2 sample_2.fastq.gz -0 sample_0.fastq.gz -
ADD REPLYlink written 10 weeks ago by ATpoint8.0k

Good tip, thanks ATpoint

ADD REPLYlink written 9 weeks ago by anoops10

Actually, collate is faster than samtools sort, and works fine for your purpose. From man samtools:

A faster alternative to a full query name sort, collate ensures that reads of the same name are grouped together in contiguous groups, but doesn't make any guarantees about the order of read names between groups.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by h.mon20k

Hello anoops!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=83934

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 9 weeks ago by Pierre Lindenbaum113k

Sorry, did not realize. Will keep in mind.

ADD REPLYlink written 9 weeks ago by anoops10
4
gravatar for h.mon
10 weeks ago by
h.mon20k
Brazil
h.mon20k wrote:

By default, picard don't output non-primary alignments, and samtools does. These secondary alignments which samtools fastq outputs should have two effects: an increase in duplication rate, as you noticed, and a larger number of reads - can you confirm this?

Probably Picard behavior is what you want. If you read the samtools manual carefully, you will see how to avoid outputting non-primary alignments.

ADD COMMENTlink written 10 weeks ago by h.mon20k

Thank you h.mon. I see that the collate routine has the option to output primary alignments only. It seems like Picard is preferable for this purpose.

Actually the read count is what triggered the problem, they both output the exact same number according to Fastqc. "Total Sequences : 49148031" in this particular case. So the higher duplication in samtools made me doubt the results.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by anoops10
1

Then I guess this is just an artifact, because after samtools collate the order of the reads has been changed and due to how FastQC measures duplication:

To cut down on the memory requirements for this module only sequences which first appear in the first 100,000 sequences in each file are analysed

You can sort the fastq files and repeat the FastQC analysis.

ADD REPLYlink written 9 weeks ago by h.mon20k

That makes more sense now, I will try the sorting. Thanks!

ADD REPLYlink written 9 weeks ago by anoops10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1582 users visited in the last hour