Question: Extract reads from a bam file to have two fastq files with the same number of reads
0
gravatar for macielrodriguez2
5 weeks ago by
macielrodriguez220 wrote:

Hi! I'm very new to the field of bioinformatics, in advance, sorry for the mistakes.

I'm working with a bam file called zm.trim.sorted.cp.bam and I ran:

samtools flagstat -@ 3 zm.trim.sorted.cp.bam

I got this:

5604169 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
5599573 + 0 mapped (99.92% : N/A)  
5604169 + 0 paired in sequencing
2804473 + 0 read1
2799696 + 0 read2
5256174 + 0 properly paired (93.79% : N/A)
5594977 + 0 with itself and mate mapped
4596 + 0 singletons (0.08% : N/A)
239951 + 0 with mate mapped to a different chr → ?
10623 + 0 with mate mapped to a different chr (mapQ>=5)

I want to extract all the reads from this bam file so I used:

samtools fastq -1 all.cp.reads_1.fq -2 all.cp.reads_2.fq zm.trim.sorted.cp.bam

I obtained two fastq files: all.cp.reads_1.fq and all.cp.reads_2.fq, and like the flagstat report said, there are 2804473 reads in all.cp.reads_1.fq and 2799696 in all.cp.reads_2.fq.

There are 4777 more reads in all.cp.reads_1.fq than in all.cp.reads_2.fq. I wanted to know what are those reads and how to eliminate them. I want to have the same number of reads in my two fastq files (each read with its mate).

Thank you in advance for your help :)

ADD COMMENTlink modified 5 weeks ago by genomax70k • written 5 weeks ago by macielrodriguez220
1
gravatar for genomax
5 weeks ago by
genomax70k
United States
genomax70k wrote:

You can use repair.sh from BBMap suite to bring the two files in sync. You will find a guide here.

Minimally you can do (remove .gz if your files are not compressed) :

$ repair.sh in1=broken1.fq.gz in2=broken2.fq.gz out1=fixed1.fq.gz out2=fixed2.fq.gz outs=singletons.fq repair
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by genomax70k

It worked great!!! Thank you so much genomax!!!

But what are these reads? According to my flagstat, all my reads should be paired. Why are these reads without its mates?

ADD REPLYlink written 5 weeks ago by macielrodriguez220
1
4596 + 0 singletons (0.08% : N/A)

Some are explained by those.

ADD REPLYlink written 5 weeks ago by genomax70k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 821 users visited in the last hour