Good afternoon group,
I'm looking for a way to graph the variability of a base call across the genome as a percentage of the coverage at the base. I'm hoping to use this visual to see if there are SNPs that happen across the genome at the same frequency to see if I can see different haplotypes for my virus. I would also like to use this visual to find areas of more variability and see if in those areas I can use one of the current viral reconstruction algorithms to do some target haplotyping.
I'm know this should be able to be done. I have a good reference, I have a bam file with all my sorted reads. I'm just not sure how to put them together in a graph. I have been reading the documentation for Rsamtools and it looks like I can use it to read in the file. From there I figured I could write a loop script to tally all the of the observed nt at each bp in the reference to get the % of each nt at that position. Then I would like to make a line graph of all the nts that are not ref and against the % they show up it in.
I'm just not sure how to use the ref and the aln.bam file together to get started, can someone point me in the correct direction?