Let me give you some background so you can understand my problem.
1- I produced a normal .BAM file.
2- I extracted ONLY the unmapped reads, which are 29M reads. It looks like this (first line):
HX6_24184:8:1205:14783:73264 141 * 0 0 55S22M74S * 0 0 TCAAGAAGTTTTAGCAGAAGAAATTCCAATGCTTTTATTATATGGAGAAATTGAAAATACAGTTTATAGACCAGAAAAATATGATTATTGGACAACTAGATATGACCATACTAAACTAGATCATCCTAAATTATCATATGTAATAAGACCA AAAAFJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJFFFFJJFJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJFJJFJJJFFJJJJJJJJJFFJAJFJJJJJJJ<JJJJFJJJJJJFFJAAFAJFJJA RX:Z:AAACACCCAATGAAAC QX:Z:-AFFFJJJJFJJJJJJ BC:Z:CGTATCGG QT:Z:AAF<AFJJ XS:f:-138 XC:Z: AC:Z: AS:f:-137 XM:A:0 AM:A:0 XT:i:0 BX:Z:AAACACCCAATGAAAC-1 RG:Z:HAP6977146:LibraryNotSpecified:1:unknown_fc:0
3- I wanted to convert the unmapped.bam to paired-end fastq files (R1, R2 and singletons).
To convert the .BAM to .fastq I used
samtools fastq -T BX -s ./singletons.fastq ./phased_possorted_unmapped_bf0x4_bam.bam -1 phased_possorted_unmapped_bf0x4_R1.fastq -2 phased_possorted_unmapped_bf0x4_R2.fastq
From the output I have the following sizes and number of reads:
- phased_possorted_unmapped_bf0x4_R1.fastq --> 2.1GB and 7M reads
- phased_possorted_unmapped_bf0x4_R1.fastq --> 2.4GB and 7M reads
- singletons.fastq --> 4.9GB and 15M reads
Then I wanted to do exactly the same, but with my .BAM sorted by read name before doing this step.
samtools sort -n of my .bam:
samtools sort -n ./phased_possorted_unmapped_bf0x4_bam.bam -o ./phased_possorted_unmapped_bf0x4_sorted_bam.bam
Then I used samtools fasq again and these are the results:
- phased_possorted_unmapped_bf0x4_sorted_R1.fastq --> 3.7GB and 12.5M reads
- phased_possorted_unmapped_bf0x4_sorted_R1.fastq --> 4.2GB and 12.5M reads
- sorted_singletons.fastq --> 1.4GB and 4M reads
I don't understand why I have these differences in the number of reads for the fastq files. Sorting should not modify anything and I should have the same number of reads regardless the sorting.
I would appreciate if someone knows what is happening.