I have whole genome sequences of several clinical isolates and I want to get the nucleotide frequencies (% A,C,T,G) at a given position. Among novel SNPs we could discover, there are some very specific positions that I want to look at. There are enough clinical isolates that I don't wish to this manually using IGV.
I can do this, but it's extremely inelegant and I think that there is some more elegant way that I am missing. I have summarize that there are probably two approaches.
1 . Run the basic samtools SNP pipeline, outputting all positions without a VCF ( remove -V) and outputting all possible reference alleles.
2. Write a script that parses the info column in the VCF file, get the depth info for each nucleotide, calculate the frequencies (for my variants of interest only); luckily I already have this script.
1. Use bedtools and convert that bam to fastq
2. Use seqtk and convert fastq to fasta
3. Use bedtools nuc command, specifying position of interest
I feel like this is too common a task to require the use of so many tools and file format conversions and even custom scripts, so I ask : what's a better way to do this?