Hi, I need some help automating and speeding up my data analysis.
For now, I track variants from BAM files using mpileup (samtools) and obtain a CSV (converted from VCF using BCFTools) with a structure like this:
So it looks like this
What I want to do is add two more fields, one for CONTEXT (which would give me the BP before and after the original chromosome position) and whether or not the original BP is a Common SNP.
So it would look like this
My problem is that for now, I have to lookup each position manually using the USCS Genome Browser (https://genome.ucsc.edu), zoom out 3x, and then manually copy the leading and trailing BP and check if it registers as a Common SNP. This gets me the Context and if the original chromosome position was a SNP or not. I want this part automated, but I don't know what the best way to go about this would be. I have my reference sequence (GRCh37-lite.fa) but I'm not sure how to go about extracting individual BP from it.