Hi! I'm very new to the field of bioinformatics, in advance, sorry for the mistakes.
I'm working with a bam file called zm.trim.sorted.cp.bam and I ran:
samtools flagstat -@ 3 zm.trim.sorted.cp.bam
I got this:
5604169 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
5599573 + 0 mapped (99.92% : N/A)
5604169 + 0 paired in sequencing
2804473 + 0 read1
2799696 + 0 read2
5256174 + 0 properly paired (93.79% : N/A)
5594977 + 0 with itself and mate mapped
4596 + 0 singletons (0.08% : N/A)
239951 + 0 with mate mapped to a different chr → ?
10623 + 0 with mate mapped to a different chr (mapQ>=5)
I want to extract all the reads from this bam file so I used:
samtools fastq -1 all.cp.reads_1.fq -2 all.cp.reads_2.fq zm.trim.sorted.cp.bam
I obtained two fastq files: all.cp.reads_1.fq and all.cp.reads_2.fq, and like the flagstat report said, there are 2804473 reads in all.cp.reads_1.fq and 2799696 in all.cp.reads_2.fq.
There are 4777 more reads in all.cp.reads_1.fq than in all.cp.reads_2.fq. I wanted to know what are those reads and how to eliminate them. I want to have the same number of reads in my two fastq files (each read with its mate).
Thank you in advance for your help :)
It worked great!!! Thank you so much genomax!!!
But what are these reads? According to my flagstat, all my reads should be paired. Why are these reads without its mates?
Some are explained by those.