First of all: Im quite new to this field, so please apologize any unexact formulations or missing informations.
i want to solve the following task: In order to save time, i want to do alignments not to the whole genome, but only to chromsome 21.
My strategy so far consisted of the following steps: Align NA12878 reads to whole genome. Filter .bam file for reads where at least one read-pair mapps to chr 21 (done using awk). Filter .bam file for unmapped reads (using samtools view -f 12). Merge .bam files and to bedtools bamtofastq conversion. The problem that comes here now (and is also described online) is that bamtofastq only converts those reads to fastq which have the read-pair next to it in the .bam file.
Note here: I specifically need all reads to be retained for technical reasons later.
I tried to solve the problems by taking the reads which lack their respective read pair from the original whole genome alignment file (using picard filtersamreads includeReadList) and adding them onto the file described above. This approach reduced the number of reads lacking its read pair, but only by about one third, meaning i still loose a lot of reads.
My questions are: Shouldnt my approach solve the problem? Why are there still reads which lack their mate? Also, does anybody have an idea what i can do? is there a bamtofastq transformation which does not require every read to be paired?
Im really grateful for any input, thanks.