Hi everyone,
I am having many problems with some haplotype phasing. I have tried using variant calling followed by whatshap polyphase but it does not reliably get the correct answer. I think this is due to some issues with the fact that my PCR clearly has bias so my variant ratios are not 50:50 so when I say my ploidy is 2, it messes up.
My region of interest is quite small and as is shown in the screenshot below, I have two clear types present and yet I am struggling to find the right tools to split them up. Ultimately, I want a multi-fasta with those two types, or, as will be the case for other samples, up to 5 types might be present.
IGV example of my region of interest:
My most recent idea has been to extract those reads from my BAM and to use a stringent assembly tool to form what I hoped would be two contigs of 275bp with each type. I used the following commands:
samtools fasta -F 4 aligned_reads.bam > bam.fasta
spades.py --only-assembler -s bam.fasta -o spades_output -k 55,77,99 --careful
This has resulted in contigs which do not specifically span my region of interest, although some of the contigs do contain parts of my region of interest. I figured something is going wrong here because my extracted reads are whole reads, not reads clipped to my reference sequence.
I have two questions:
1) Does anyone have a better idea how I can 'extract' the two haplotypes which are clearly visible in my BAM? 2) If no better ideas, does anyone know how I can export those reads as a fasta but keep the soft-clipping so that no reads contain sequence outside of my 275bp window?
Thank you in advance!