Question: How to annotate a VCF with Entrez Gene IDs
0
gravatar for Ward Weistra
4.2 years ago by
Ward Weistra180
Netherlands
Ward Weistra180 wrote:

Dear Biostars,

I would like to annotate my VCF with Entrez Gene IDs. I have found ways to add the HGNC Gene Symbol and the Ensemble Gene ID (VEP, Annovar), but not directly to Entrez Gene IDs. I prefer not to translate from the HGNC or Ensemble to Entrez, because I'm afraid information gets lost with this extra translation.

Maybe a BED file with all Entrez Gene IDs might help, since I've found tools to merge annotate VCF files via BED files in Galaxy.
Maybe I'm just using the wrong term for Entrez Gene IDs. I mean, for example, the 7157 in http://www.ncbi.nlm.nih.gov/gene/7157.

Thanks in advance,
Ward

 

ncbi entrez vcf annotation gene • 2.6k views
ADD COMMENTlink modified 4.2 years ago by EnsemblWill560 • written 4.2 years ago by Ward Weistra180

@wardweistra sorry to comment with a question. I kinder spend all day reading up on variant calling and how to get a causative gene(s) from vcf files. By causative gene I mean the gene that causes a particular phenotype. At this stage I'm trying just to understand the lingo. So by annotating VCF you mean that all SNP (variants) will be assign to a gene (or other feature) ? If that's the case will that be a new file or annotation can be held in vcf file..? any help is much appreciated.

p.s I don't know how help this might be, but this tool snpSift seems to do annotation http://snpeff.sourceforge.net/SnpSift.html 

ADD REPLYlink written 4.2 years ago by Kirill260
3
gravatar for EnsemblWill
4.2 years ago by
EnsemblWill560
United Kingdom
EnsemblWill560 wrote:

If you use the RefSeq transcript set with VEP, you get the Entrez gene IDs in the Gene column of the output:

> echo "17 7673573 . T C" | perl variant_effect_predictor.pl -refseq -database -force -o stdout -fields Gene,Feature,Consequence | grep -v ##
#Gene   Feature Consequence
7157    NM_001126115.1  missense_variant
7157    NM_001276696.1  missense_variant
7157    NM_001276697.1  missense_variant
7157    NM_001126113.2  missense_variant
7157    NM_001126118.1  missense_variant
7157    NM_001276699.1  missense_variant
7157    NM_000546.5     missense_variant
7157    NM_001276760.1  missense_variant
7157    NM_001126114.2  missense_variant
7157    NM_001276695.1  missense_variant
7157    NM_001276761.1  missense_variant
7157    NM_001126117.1  missense_variant
7157    NM_001276698.1  missense_variant
7157    NM_001126112.2  missense_variant
7157    NM_001126116.1  missense_variant

You can do the same in the VEP web interface by selecting the relevant transcript set when you submit your job.

ADD COMMENTlink written 4.2 years ago by EnsemblWill560
0
gravatar for Pierre Lindenbaum
4.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

I wrote a tool to annotate a vcf from another indexed vcf. https://github.com/lindenb/jvarkit/wiki/VcfPeekVcf

for example , to annotate a 1Kg with the VCF from NCBI/dbsnp: http://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/ , we peek the INFO named GENEINFO (and we add a NCBI_VCF_ prefix)

 

$  curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" |\
gunzip -c |\
java -jar dist/vcfpeekvcf.jar  -f ncbi/snp/organisms/human_9606/VCF/00-All.vcf.gz  -t GENEINFO -p NCBI_VCF_ |\
cut -f 1-8 | grep NCBI_VCF_GENEINFO | head


##INFO=<ID=NCBI_VCF_GENEINFO,Number=1,Type=String,Description="Pairs each of gene symbol:gene id.  The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
22    16260678    rs5746333    G    A    100    PASS    AA=G|||;AC=3244;AF=0.647764;AFR_AF=0.3888;AMR_AF=0.5634;AN=5008;DP=8520;EAS_AF=0.9673;EUR_AF=0.6133;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.7638;VT=SNP
22    16264717    rs148113506    TA    T    100    PASS    AA=A|A|-|deletion;AC=2066;AF=0.41254;AFR_AF=0.3858;AMR_AF=0.4265;AN=5008;DP=53564;EAS_AF=0.4196;EUR_AF=0.4274;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.4162;VT=INDEL
22    16265110    rs2212121    C    T    100    PASS    AA=C|||;AC=416;AF=0.0830671;AFR_AF=0.0045;AMR_AF=0.1744;AN=5008;DP=22443;EAS_AF=0.1667;EUR_AF=0.0219;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.1012;VT=SNP
22    16267558    rs2010682    T    C    100    PASS    AA=C|||;AC=4111;AF=0.820887;AFR_AF=0.8434;AMR_AF=0.6758;AN=5008;DP=10404;EAS_AF=0.9762;EUR_AF=0.7097;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.8476;VT=SNP
22    16269466    rs2212127    T    C    100    PASS    AA=C|||;AC=3668;AF=0.732428;AFR_AF=0.6641;AMR_AF=0.6066;AN=5008;DP=2535;EAS_AF=0.9712;EUR_AF=0.6262;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.7771;VT=SNP
22    16269829    rs114833654    T    A    100    PASS    AA=A|||;AC=4085;AF=0.815695;AFR_AF=0.7186;AMR_AF=0.768;AN=5008;DP=7907;EAS_AF=0.9484;EUR_AF=0.7992;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.8609;VT=SNP
22    16277622    rs2845217    G    A    100    PASS    AA=A|||;AC=2911;AF=0.58127;AFR_AF=0.3298;AMR_AF=0.5216;AN=5008;DP=5436;EAS_AF=0.9167;EUR_AF=0.5467;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.6534;VT=SNP
22    16285169    rs192723103    T    G    100    PASS    AA=T|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0.0014;AN=5008;DP=23204;EAS_AF=0;EUR_AF=0;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0;VT=SNP
22    16285178    rs184299536    G    C    100    PASS    AA=G|||;AC=1;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=5008;DP=23166;EAS_AF=0;EUR_AF=0;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0;VT=SNP

 

see also: snpsift http://snpeff.sourceforge.net/SnpSift.html#annotate

ADD COMMENTlink written 4.2 years ago by Pierre Lindenbaum122k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1623 users visited in the last hour