bam to fastq conversion, number of reads do not match
1
0
Entering edit mode
6.6 years ago
Anushka ▴ 20

Hello, I am trying to convert some publically available .bam files to the fastq format, using picard tool function SamToFastq as following:

 $ java -Xmx4g  -jar picard.jar SamToFastq NON_PF=true INPUT=input.bam F=input_1.fastq.gz F2=input_2.fastq.gz FU=unpaired_input.fastq.gz

The resulting fastq files have lesser number of reads than original bam file. I am checking like this:

Using samtools view input.bam | wc -l resulted into 62193989

While,zcat input_1.fastq.gz | wc -l is reporting 103572960 (51786480)

Why is there are less number of reads in the fastq file? I have tried to with UNPAIRED_FASTQ=File option in picard tools, which is reporting zero reads.

I would appreciated if someone could explain why this is happening? Either I am trying with wrong approach of making a correspondence between above numbers or is there something going wrong during conversion. Which is the best way to check whether the .bam to .fastq conversion went well?

RNA-Seq bam sam picard • 2.7k views
ADD COMMENT
1
Entering edit mode

Do not use wc -l with samtools. Use samtools view -c in.bam, which is much faster. Be also aware that a fastq contains 4 rows per read. Specifically to your problem, be sure to use INCLUDE_NON_PRIMARY_ALIGNMENTS=true as Pierre suggested to include non-primary alignments. After doing that, count again.

ADD REPLY
2
Entering edit mode
6.6 years ago

Why is there are less number of reads in the fastq file?

because the the bam might contains some secondary alignments and supplementary alignments.

enter image description here

ADD COMMENT
0
Entering edit mode

Thank you so much Pierre. When I tried with samtools view -F 256 input.bam | wc -l then it returned me the exact value which matches the fastq.

ADD REPLY

Login before adding your answer.

Traffic: 1867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6