I have a data set which has adjacent normal and tumour samples for both DNA and RNA molecules of cancer patients. Somatic variants have been called using the pair of DNA samples for each patient. I am interested in reporting only the variants which have support by RNA:
Each variant has consequence predicted by a tool such as Variant Effect Predictor. Then, support is determined by:
- Missense mutations should be seen in the cancer RNA sample.
- Stop-gained mutations should have reads not containing the mutation in the normal RNA sample, because the mutation could be subject to nonsense-mediated decay in the cancer RNA sample. I suppose in the absence of a matched normal RNA in a different data set (extremely rare to have adjacent normal RNA in data sets), I could look for reads not containing the mutation in the cancer RNA sample, assuming the tumour purity of the sample is substantially less than 100% and many normal cells were sequenced.
So, I am wondering about the existence of a tool that takes as input a VCF file and one or more BAM files and, for each variant in the VCF file, calculates AD and DP numerical summaries for the one or more BAM files provided. What software can I use?