I am trying to characterize the depth of coverage for SNPs/INDELs/other variants in my BAM/SAM file. Using samtools mpileup generates a nice VCF that gives me information on this, but it's not quite complete. Entering the "-t AD" flag gives me the depth of the alternate alleles compared to the reference. That is what I want, though the actual depth is found in the column after the flag is indicated and it is colon delimited. It is readable, but not terribly concise.
However, I am trying to check specific positions along the genome for specific alternate alleles. Is there a convenient way to parse the VCF for the AD flag values by position and allele without complicated and format-specific if/else statements?
I can use bedtools intersect or a dozen other methods to reduce the file to specific positions, already, but I'm hoping there's a method that doesn't require piping a handful of tools together.
It would be straightforward to write an if statement of the kind:
if ALT column is T then return AD and if AD is > 20 return "likely to be target organism strain"
However, over entire genomes and many different variant calls, this won't be so trivial. I have a file that contains a list of positions along the genome and alleles that I'm specifically looking for. I guess I should better word my question to ask for some software that performs that level of analysis. I need to compare allele identity at a given position to a file containing many positions and alleles of interest and then return the coverage of those alleles in the sample that match the ones in the comparison file. After that, I need to review the depth to determine the significance of the variant - which is arbitrary.