Question: BAM to FASTQ picard or samtools
1
gravatar for anoops
4 months ago by
anoops10
anoops10 wrote:

Hello,

I am trying to convert a batch of BAM files to FASTQs. I started out testing SAMTOOLS (collate/bam2fq) and PICARD (SAMTOFATQ). On the outset the numbers seemed OK but the statistics suggests that the SAMTOOLS out has twice the amount of duplicates as the Picard out.

Has anyone experienced this? I am not sure if it is a samtools problem or I am not comprehending the QC stats.

Any advice/recommendation/comments are welcome.

Thanks!

PS: In both cases I am outputting both first end of the pair and the second end of the pair as separate files.

UPDATED: The commands used were:

Samtools

samtools collate -o name-collate.bam sample.bam
samtools fastq -1 sample_1.fastq.gz -2 sample_2.fastq.gz -0 sample_0.fastq.gz name-collate.bam

Picard

java -Xmx2g -jar picard.jar SamToFastq I=sample.bam FASTQ=sample_1p.fastq.gz SECOND_END_FASTQ=sample_2p.fastq.gz UNPAIRED_FASTQ=sample_0p.fastq.gz

Fasqc check

fastqc -o fastqc_out/ sample_1p.fastq.gz

Picard QC

Picard

Samtools QC

Samtools

sequencing next-gen assembly • 608 views
ADD COMMENTlink modified 4 months ago by h.mon21k • written 4 months ago by anoops10
1

It would be a big help if you could provide the command lines used

ADD REPLYlink written 4 months ago by swbarnes24.5k

I didn't include them because they were default. They are included now. Thanks in advance!!!

ADD REPLYlink written 4 months ago by anoops10

FYI, you do not need collate. A simple sort by name with the -n option of samtools sort will "restore" the read order as it was obtained from the sequencer, so pretty much random. This you can directly pipe into samtools fastq:

samtools sort -n in.bam | samtools fastq -1 sample_1.fastq.gz -2 sample_2.fastq.gz -0 sample_0.fastq.gz -
ADD REPLYlink written 4 months ago by ATpoint11k

Good tip, thanks ATpoint

ADD REPLYlink written 4 months ago by anoops10

Actually, collate is faster than samtools sort, and works fine for your purpose. From man samtools:

A faster alternative to a full query name sort, collate ensures that reads of the same name are grouped together in contiguous groups, but doesn't make any guarantees about the order of read names between groups.

ADD REPLYlink modified 4 months ago • written 4 months ago by h.mon21k

Hello anoops!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=83934

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 4 months ago by Pierre Lindenbaum115k

Sorry, did not realize. Will keep in mind.

ADD REPLYlink written 4 months ago by anoops10
4
gravatar for h.mon
4 months ago by
h.mon21k
Brazil
h.mon21k wrote:

By default, picard don't output non-primary alignments, and samtools does. These secondary alignments which samtools fastq outputs should have two effects: an increase in duplication rate, as you noticed, and a larger number of reads - can you confirm this?

Probably Picard behavior is what you want. If you read the samtools manual carefully, you will see how to avoid outputting non-primary alignments.

ADD COMMENTlink written 4 months ago by h.mon21k

Thank you h.mon. I see that the collate routine has the option to output primary alignments only. It seems like Picard is preferable for this purpose.

Actually the read count is what triggered the problem, they both output the exact same number according to Fastqc. "Total Sequences : 49148031" in this particular case. So the higher duplication in samtools made me doubt the results.

ADD REPLYlink modified 4 months ago • written 4 months ago by anoops10
1

Then I guess this is just an artifact, because after samtools collate the order of the reads has been changed and due to how FastQC measures duplication:

To cut down on the memory requirements for this module only sequences which first appear in the first 100,000 sequences in each file are analysed

You can sort the fastq files and repeat the FastQC analysis.

ADD REPLYlink written 4 months ago by h.mon21k

That makes more sense now, I will try the sorting. Thanks!

ADD REPLYlink written 4 months ago by anoops10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1776 users visited in the last hour