count number of variants per gene from .vcf annovar&snpsift
1
1
Entering edit mode
7.3 years ago
User6891 ▴ 290

Hi everyone,

I have a multisample .vcf file which I've annotated with annovar (gene annotation) and I've used SnpSift CaseControl on it to assign samples as cases and controls. For each variant SnpSift reports (as extra columns in the .vcf) the number of times this variant was found in cases and in controls and it does some statistical testing. However, instead of doing this now on a variant level, I want to check if certain GENES have more variants in the controls than in the samples. This can be again an extra column in the .vcf, or something seperate. I annotated the .vcf with Annovar, so the gene information is present in the .vcf.

Is there a tool (like SnpSift) to do this? Or can I better write my own script? (if so, what would be the easiest way?)

annovar snpsift vcf gene • 5.2k views
ADD COMMENT
0
Entering edit mode

I think you will need to extract that information into a separate table and run an independent statistical test on that.

ADD REPLY
1
Entering edit mode
7.3 years ago
Pablo ★ 1.9k

Short answer: Use "SnpEff count"

Long answer: Split your file in two: cases.vcf and controls.vcf, then use "SnpEff count", it will tell you the number of variants and bases on each possible genomic interval (gene, transcript, exon, intron, etc.).

http://snpeff.sourceforge.net/SnpEff_manual.html#utils (scroll down to "SnpEff count")

Usage:

java -Xmx4g -jar snpEff.jar count -v hg19 test.vcf > test_count.txt
ADD COMMENT
0
Entering edit mode

When using snpEff count like you proposed, I get in the output file the number of reads & bases for each possible genomic interval.

Does he really mean 'reads', or does he mean 'variants'?

ADD REPLY

Login before adding your answer.

Traffic: 2747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6