Question

Extract reads from a bam file to have two fastq files with the same number of reads

0

Entering edit mode

5.3 years ago

macielrodriguez2 ▴ 50

Hi! I'm very new to the field of bioinformatics, in advance, sorry for the mistakes.

I'm working with a bam file called zm.trim.sorted.cp.bam and I ran:

samtools flagstat -@ 3 zm.trim.sorted.cp.bam

I got this:

5604169 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
5599573 + 0 mapped (99.92% : N/A)  
5604169 + 0 paired in sequencing
2804473 + 0 read1
2799696 + 0 read2
5256174 + 0 properly paired (93.79% : N/A)
5594977 + 0 with itself and mate mapped
4596 + 0 singletons (0.08% : N/A)
239951 + 0 with mate mapped to a different chr → ?
10623 + 0 with mate mapped to a different chr (mapQ>=5)

I want to extract all the reads from this bam file so I used:

samtools fastq -1 all.cp.reads_1.fq -2 all.cp.reads_2.fq zm.trim.sorted.cp.bam

I obtained two fastq files: all.cp.reads_1.fq and all.cp.reads_2.fq, and like the flagstat report said, there are 2804473 reads in all.cp.reads_1.fq and 2799696 in all.cp.reads_2.fq.

There are 4777 more reads in all.cp.reads_1.fq than in all.cp.reads_2.fq. I wanted to know what are those reads and how to eliminate them. I want to have the same number of reads in my two fastq files (each read with its mate).

Thank you in advance for your help :)

alignment genome sequencing assembly • 1.1k views

ADD COMMENT • link updated 5.3 years ago by GenoMax 147k • written 5.3 years ago by macielrodriguez2 ▴ 50

score 1 · Answer 1 · 2019-07-14

1

Entering edit mode

5.3 years ago

GenoMax 147k

You can use repair.sh from BBMap suite to bring the two files in sync. You will find a guide here.

Minimally you can do (remove .gz if your files are not compressed) :

$ repair.sh in1=broken1.fq.gz in2=broken2.fq.gz out1=fixed1.fq.gz out2=fixed2.fq.gz outs=singletons.fq repair

ADD COMMENT • link 5.3 years ago by GenoMax 147k

0

Entering edit mode

It worked great!!! Thank you so much genomax!!!

But what are these reads? According to my flagstat, all my reads should be paired. Why are these reads without its mates?

ADD REPLY • link 5.3 years ago by macielrodriguez2 ▴ 50

1

Entering edit mode

4596 + 0 singletons (0.08% : N/A)

Some are explained by those.

ADD REPLY • link 5.3 years ago by GenoMax 147k