I have mapped a paired end file against a list of elements I'd like to filter out of my reads using BWA, and then tried to extract the unmapped reads using samtools view -f12 my_bam.bam | samtools fastq -1 reads1.fastq -2 reads2.fastq
However I'm getting different numbers of reads in the two files. Looking at what was present in one, but not the other, I come accross this pair:
ST-E00192:703:HMV2VCCXY:4:2208:20202:1362 77 ENSG00000207113|ENST00000384385 114 37 9M1I8M = 114 0 TTTTAAAAGATGGGGTCT AAFFFJJJJJJJJJJAJJ XT:A:U NM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:1XO:i:1 XG:i:1 MD:Z:13C3
ST-E00192:703:HMV2VCCXY:4:2208:20202:1362 133 ENSG00000207113|ENST00000384385 114 0 * = 114 0 NGACCCCATCTTTTAAAA #A<AAA<J7FJFJJAJJJ
The flag for the first read (77) is read_paired, read_unmapped, mate_unmapped, first_in_pair.
The flag for the second read is read_paired, read_unmapped, second_in_pair.
Note that "mate_unmapped" is missing for the second read, dispite the fact is has no mate information, while the first read says it is unmapped, despite have a location, only 2 mismatches and not mapping anywhere else.
Any ideas?
So this is what is happening, the first read extends off the end of a contig. But how come the flags for the second read says that the first read is mapped?
Well, it pretty much is mapped. It has a mapping location, after all.
But I would expect the flags on the two reads to match. In anycase, any ideas how I extract unmapped pairs of reads from the BAM file?