We are analyzing viral evolution by analyzing mutations present in a specific genomic location and how it evolves over time. We are performing amplicon sequencing of a specific region that is 222-228 bp at intervals. There are two versions of the virus and 3 point mutations can cause a shift from virus A to virus B. For each sample I would like to create a chart that shows the variant sequence and then the percentage of that sequence within the reads.
I performed alignment to a fasta file with only two reference lines: the 222 bp of Virus A and the 228 of virus B using BWA. I then analyzed the percentage of reads that aligned to either reference.
I now am lost on how to get the output that I would like : (This image is from CRISPResso2 but I want something similar for non-CRISPR data)
I used GATK - haplotype caller to call variants but don't know if that was the correct method to perform or if i should do multi-sequence alignment or something much more simple? I now have the VCF file from GATK but don't know where to go from here.
Thank you for any advice, Sara
just wanted to say this is a interesting visualization that crispresso has. can you trick crispresso into using your data? also does the visualization try to filter out any "spurious read errors" or is it just raw reads?