Question: How to add protein ID transcript ("NP_")?
0
gravatar for agata88
3.1 years ago by
agata88770
Poland
agata88770 wrote:

Hi all,

I would like to add "NP_" number to snp variants. I am using SnpEff and SnpSift.

I was using command below to annotate variants to hg19:

java -jar snpEff.jar -v -canon hg19 test_2.vcf > test_3.vcf

Output vcf includes NM number (transcript ID) but no protein number (NP).

Any idea how to add this?

Best,

Agata

next-gen annotation • 1.2k views
ADD COMMENTlink modified 3.1 years ago by Pierre Lindenbaum120k • written 3.1 years ago by agata88770

Mapping refseq transcripts to encoded proteins (NM_ to NP_)

How To Transfer Gene Id Into Protein Id

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Sukhdeep Singh9.7k

Thanks, but I was wondering how to do that using SnpEff or SnpSift ...

ADD REPLYlink written 3.1 years ago by agata88770

Ahh, wait for someone more experienced with that then :)

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Sukhdeep Singh9.7k
2
gravatar for Pierre Lindenbaum
3.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

first check snpeff is not able to print the NP_*

the ugly way: extract the NMs from your VCF, get the NP_ with NCBI efetch and create a sed file appending the NP to the NM...

 curl -Ls "https://raw.githubusercontent.com/arraytools/vc-annotation/master/snpeff/tmp/nonsyn_splicing.vcf" | grep -v "##" |\
cut -f 8 | tr ";" "\n" | grep "^ANN=" | tr "|" "\n" | grep "^NM_*" | while read S;\
do
       curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${S}&rettype=gbc" |\
       xmllint --xpath '//INSDQualifier_name[.="protein_id"]/../INSDQualifier_value/text()' - |\
       awk -v NM=${S}  '{printf("s/|%s|/|%s|%s|/g\n",NM,NM,$1);}'  ; 
done > out.sed

then use this sed file to append the NP to the NM:

$ curl -Ls "https://raw.githubusercontent.com/arraytools/vc-annotation/master/snpeff/tmp/nonsyn_splicing.vcf" | sed -f out.sed | grep NP_ | head


1   1014228 COSM3751464 G   A   11.4963 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;GENE=ISG15;ANN=A|missense_variant|MODERATE|ISG15|ISG15|transcript|NM_005101.3|NP_005092.1|protein_coding|2/2|c.248G>A|p.Ser83Asn|355/666|248/498|83/165||   GT:PL   0/1:39,3,0
1   21850165    .   A   T   4.88476 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;ANN=T|missense_variant|MODERATE|HSPG2|HSPG2|transcript|NM_001291860.1|NP_001278789.1|protein_coding|57/97|c.7325T>A|p.Ile2442Asn|7405/14343|7325/13179|2442/4392||  GT:PL   0/1:31,3,0
1   27793570    .   G   A   5.58414 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;ANN=A|missense_variant|MODERATE|STX12|STX12|transcript|NM_177424.2|NP_803173.1|protein_coding|3/9|c.226G>A|p.Glu76Lys|351/3079|226/831|76/276|| GT:PL   0/1:32,3,0
1   114720652   COSM4218186 G   A   13.3811 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;GENE=CSDE1;ANN=A|stop_gained|HIGH|CSDE1|CSDE1|transcript|NM_001242891.1|NP_001229820.1|protein_coding|18/21|c.2077C>T|p.Gln693*|2599/4313|2077/2535|693/844||;LOF=(CSDE1|CSDE1|1|1.00);NMD=(CSDE1|CSDE1|1|1.00) GT:PL   0/1:41,3,0
1   153986128   .   A   G   10.9943 PASS    DP=2;SGB=-0.379885;MQ0F=0;AC=2;AN=2;DP4=0,0,0,1;MQ=50;ANN=G|missense_variant|MODERATE|RAB13|RAB13|transcript|NM_002870.3|NP_002861.1|protein_coding|1/8|c.109T>C|p.Tyr37His|250/1235|109/612|37/203||   GT:PL   1/1:38,3,0
1   173485410   COSM5378456 C   T   6.32957 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;GENE=PRDX6;ANN=T|missense_variant|MODERATE|PRDX6|PRDX6|transcript|NM_004905.2|NP_004896.1|protein_coding|3/5|c.302C>T|p.Pro101Leu|353/1670|302/675|101/224||    GT:PL   0/1:33,3,0
1   183116645   .   A   T   7.93884 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,0,1;MQ=50;ANN=T|missense_variant|MODERATE|LAMC1|LAMC1|transcript|NM_002293.3|NP_002284.3|protein_coding|7/28|c.1397A>T|p.Lys466Ile|1654/7889|1397/4830|466/1609|| GT:PL   0/1:35,3,0
1   228097438   .   T   C   7.93884 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,0,1;MQ=50;ANN=C|missense_variant|MODERATE|ARF1|ARF1|transcript|NM_001024226.1|NP_001019397.1|protein_coding|3/5|c.245T>C|p.Phe82Ser|473/1973|245/546|82/181|| GT:PL   0/1:35,3,0
1   235246483   .   G   C   6.32957 PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;ANN=C|missense_variant|MODERATE|ARID4B|ARID4B|transcript|NM_001206794.1|NP_001193723.1|protein_coding|7/24|c.383C>G|p.Pro128Arg|760/5946|383/3939|128/1312||    GT:PL   0/1:33,3,0
10  73913343    COSM4144861 T   C   9.6729  PASS    DP=1;SGB=-0.379885;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=0,0,1,0;MQ=50;GENE=PLAU;ANN=C|missense_variant|MODERATE|PLAU|PLAU|transcript|NM_002658.3|NP_002649.1|protein_coding|6/11|c.422T>C|p.Leu141Pro|568/2377|422/1296|141/431|| GT:PL   0/1:37,3,0
ADD COMMENTlink written 3.1 years ago by Pierre Lindenbaum120k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 667 users visited in the last hour