I would like to assemble the reads that did not align to my reference genome. I understand that I can gather the unmapped reads with
samtools view -f 12 -F 256 aligned-sorted-deduplicated.bam > unmapped.bam
-f12 = keep read unmapped, mate unmapped and
-F256 = skip not primary alignment. I am essentially keeping only the pairs that did not match. Then I need to re-sort the reads with
samtools sort -n unmapped.bam unmapped-sorted.bam
The manual indicates to use fastq files, so I need to extract them with
bamToFastq -i unmapped-sorted.bam -fq R1.fastq -fq2 R2.fastq
and then assemble with
spades.py -1 R1.fastq -2 R2.fastq -o someFolder
I would like to ask:
- Is this procedure correct?
- Can I compress the fastq files directly?
- Can I use the unmapped-sorted.bam without making the fastq files and if yes how?