Hi! I'm very new to the field of bioinformatics, in advance, sorry for the mistakes.
I'm working with a bam file called zm.trim.sorted.cp.bam and I ran:
samtools flagstat -@ 3 zm.trim.sorted.cp.bam
I got this:
5604169 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 5599573 + 0 mapped (99.92% : N/A) 5604169 + 0 paired in sequencing 2804473 + 0 read1 2799696 + 0 read2 5256174 + 0 properly paired (93.79% : N/A) 5594977 + 0 with itself and mate mapped 4596 + 0 singletons (0.08% : N/A) 239951 + 0 with mate mapped to a different chr → ? 10623 + 0 with mate mapped to a different chr (mapQ>=5)
I want to extract all the reads from this bam file so I used:
samtools fastq -1 all.cp.reads_1.fq -2 all.cp.reads_2.fq zm.trim.sorted.cp.bam
I obtained two fastq files: all.cp.reads_1.fq and all.cp.reads_2.fq, and like the flagstat report said, there are 2804473 reads in all.cp.reads_1.fq and 2799696 in all.cp.reads_2.fq.
There are 4777 more reads in all.cp.reads_1.fq than in all.cp.reads_2.fq. I wanted to know what are those reads and how to eliminate them. I want to have the same number of reads in my two fastq files (each read with its mate).
Thank you in advance for your help :)