Question: bam to fastq conversion, number of reads do not match
0
gravatar for Anushka
2.7 years ago by
Anushka20
France
Anushka20 wrote:

Hello, I am trying to convert some publically available .bam files to the fastq format, using picard tool function SamToFastq as following:

 $ java -Xmx4g  -jar picard.jar SamToFastq NON_PF=true INPUT=input.bam F=input_1.fastq.gz F2=input_2.fastq.gz FU=unpaired_input.fastq.gz

The resulting fastq files have lesser number of reads than original bam file. I am checking like this:

Using samtools view input.bam | wc -l resulted into 62193989

While,zcat input_1.fastq.gz | wc -l is reporting 103572960 (51786480)

Why is there are less number of reads in the fastq file? I have tried to with UNPAIRED_FASTQ=File option in picard tools, which is reporting zero reads.

I would appreciated if someone could explain why this is happening? Either I am trying with wrong approach of making a correspondence between above numbers or is there something going wrong during conversion. Which is the best way to check whether the .bam to .fastq conversion went well?

rna-seq bam sam picard • 1.3k views
ADD COMMENTlink modified 2.7 years ago by Pierre Lindenbaum128k • written 2.7 years ago by Anushka20
1

Do not use wc -l with samtools. Use samtools view -c in.bam, which is much faster. Be also aware that a fastq contains 4 rows per read. Specifically to your problem, be sure to use INCLUDE_NON_PRIMARY_ALIGNMENTS=true as Pierre suggested to include non-primary alignments. After doing that, count again.

ADD REPLYlink written 2.7 years ago by ATpoint34k
2
gravatar for Pierre Lindenbaum
2.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

Why is there are less number of reads in the fastq file?

because the the bam might contains some secondary alignments and supplementary alignments.

enter image description here

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Pierre Lindenbaum128k

Thank you so much Pierre. When I tried with samtools view -F 256 input.bam | wc -l then it returned me the exact value which matches the fastq.

ADD REPLYlink written 2.7 years ago by Anushka20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1129 users visited in the last hour