Question: How should I extract mutated genes from a VCF file?
0
gravatar for jrleary
5 months ago by
jrleary130
Lineberger Comprehensive Cancer Center
jrleary130 wrote:

I've written a new pipeline for my lab to process and call variants on whole exome sequencing data, built around bwa-mem, samtools, Picard, and GATK. The variant calling is done using Mutect2, and I've filtered and annotated the SNP/indel calls using FilterMutect2 and Funcotator. Whole exam seq is absolutely not my specialty, so I'm somewhat at a loss as to what I should do next. I'd like to end up with some tables / visualizations detailing which genes are mutated across samples. I've been loosely following this 2017 paper, which has some great visualizations such as this one that show how genes of interest are mutated.

So, my main question is how to I extract specifically which genes are mutated in my samples? I tried using VariantsToTable, which returned to me a table containing chromosome & position, as well as whether the mutation was a SNP or an indel. Could I use the genomic coordinate to obtain the gene name?

Also, the VCF files are a nightmare to read using less, so I haven't been able to inspect the annotations I added. Are there any programs other than IGV used to inspect VCFs (I'm from a computational background, so manually inspecting a genome is somewhat out of my realm of expertise).

wes gatk exome • 249 views
ADD COMMENTlink written 5 months ago by jrleary130

how does the VCF look like after the annotation with Funcotator ?

ADD REPLYlink modified 5 months ago • written 5 months ago by Pierre Lindenbaum131k

Sorry, I'm a little unsure how to describe it. I could attach a screenshot of the file while viewing it with less, but I'm not sure how helpful that would be. Running head ${sample}.vcf returns:

##fileformat=VCFv4.2
##FILTER=<ID=base_qual,Description="alt median base quality">
##FILTER=<ID=clustered_events,Description="Clustered events observed in the tumor">
##FILTER=<ID=contamination,Description="contamination">
##FILTER=<ID=duplicate,Description="evidence for alt allele is overrepresented by apparent duplicates">
##FILTER=<ID=fragment,Description="abs(ref - alt) median fragment length">
##FILTER=<ID=germline,Description="Evidence indicates this site is germline, not somatic">
##FILTER=<ID=haplotype,Description="Variant near filtered variant on same haplotype.">
##FILTER=<ID=low_allele_frac,Description="Allele fraction is below specified threshold">
##FILTER=<ID=map_qual,Description="ref - alt median mapping quality">

Thanks much for the assistance, I'm aware that I'm not doing an excellent job of describing my problems.

ADD REPLYlink written 5 months ago by jrleary130

Also, the VCF files are a nightmare to read using less, so I haven't been able to inspect the annotations I added. Are there any programs other than IGV used to inspect VCFs

I wrote VCF2table : http://lindenb.github.io/jvarkit/VcfToTable.html

ADD REPLYlink written 5 months ago by Pierre Lindenbaum131k

I'll clone the repo and give it a shot.

ADD REPLYlink written 5 months ago by jrleary130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1356 users visited in the last hour