count number of variants per gene from .vcf annovar&snpsift
1
1
Entering edit mode
7.3 years ago
User6891 ▴ 290

Hi everyone,

I have a multisample .vcf file which I've annotated with annovar (gene annotation) and I've used SnpSift CaseControl on it to assign samples as cases and controls. For each variant SnpSift reports (as extra columns in the .vcf) the number of times this variant was found in cases and in controls and it does some statistical testing. However, instead of doing this now on a variant level, I want to check if certain GENES have more variants in the controls than in the samples. This can be again an extra column in the .vcf, or something seperate. I annotated the .vcf with Annovar, so the gene information is present in the .vcf.

Is there a tool (like SnpSift) to do this? Or can I better write my own script? (if so, what would be the easiest way?)

annovar snpsift vcf gene • 5.2k views
0
Entering edit mode

I think you will need to extract that information into a separate table and run an independent statistical test on that.

1
Entering edit mode
7.3 years ago
Pablo ★ 1.9k

Long answer: Split your file in two: cases.vcf and controls.vcf, then use "SnpEff count", it will tell you the number of variants and bases on each possible genomic interval (gene, transcript, exon, intron, etc.).

http://snpeff.sourceforge.net/SnpEff_manual.html#utils (scroll down to "SnpEff count")

Usage:

java -Xmx4g -jar snpEff.jar&nbsp;count -v hg19 test.vcf > test_count.txt

0
Entering edit mode

When using snpEff count like you proposed, I get in the output file the number of reads & bases for each possible genomic interval.

Does he really mean 'reads', or does he mean 'variants'?