Question: Add Phenotype OMIM number to vcf file
0
gravatar for psmirin
22 months ago by
psmirin0
Israel
psmirin0 wrote:

Hi,

How can I annotate or compare the variants I have got in my vcf file that include the gene name to known mutations in the same gene but different chr location for instance from OMIM? For example I have searched manually on OMIM for MED25 gene, this is the output:

Gene-Phenotype Relationships Location Phenotype Phenotype MIM number Inheritance Phenotype mapping key 19q13.33 ?Charcot-Marie-Tooth disease, type 2B2 605589 AR 3 Basel-Vanagait-Smirin-Yosef syndrome 616449 AR 3

And my file is a multi vcf :

chr19   50321587        rs114843375;138201      A       G       6630.07 PASS    AC=10;A_F=0.052;AN=192;BaseQRankSum=1.74;ClippingRankSum=0.069;DP=4747;FS=1.147;GQ_MEAN=141.68;GQ_STDDEV=124.09;InbreedingCoeff=0.1560;MLEAC=10;MLEA_F=0.052;MQ=57.41;MQ0=0;MQRankSum=0.789;NCC=0;POSITIVE_TRAIN_SITE;QD=15.07;ReadPosRankSum=0.548;SOR=0.593;VQSLOD=5.04;culprit=MQ;EFF=UTR_5_PRIME(MODIFIER||12||MED25|protein_coding|CODING|NM_030973.3|1);AF=4.645e-03;COMMON=1;AF_ESP=0.0112;AF_EXAC=0.00869;AF_TGP=0.0102;ALLELEID=141904;CLNDISDB=MedGen:CN169374;CLNDN=not_specified;CLNHGVS=NC_000019.9:g.50321587A>G;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Benign;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=MED25:81857;MC=SO:0001623|5_prime_UTR_variant;ORIGIN=1;RS=114843375        GT:AD:DP:GQ:PL  0/0:42,0:42:99:0,103,1320       0/0:48,0:48:99:0,120,1800

The output I want is to add the Phenotype MIM number in the end of the row:

chr19   50321587        rs114843375;138201      A       G       6630.07 PASS    AC=10;A_F=0.052;AN=192;BaseQRankSum=1.74;ClippingRankSum=0.069;DP=4747;FS=1.147;GQ_MEAN=141.68;GQ_STDDEV=124.09;InbreedingCoeff=0.1560;MLEAC=10;MLEA_F=0.052;MQ=57.41;MQ0=0;MQRankSum=0.789;NCC=0;POSITIVE_TRAIN_SITE;QD=15.07;ReadPosRankSum=0.548;SOR=0.593;VQSLOD=5.04;culprit=MQ;EFF=UTR_5_PRIME(MODIFIER||12||MED25|protein_coding|CODING|NM_030973.3|1);AF=4.645e-03;COMMON=1;AF_ESP=0.0112;AF_EXAC=0.00869;AF_TGP=0.0102;ALLELEID=141904;CLNDISDB=MedGen:CN169374;CLNDN=not_specified;CLNHGVS=NC_000019.9:g.50321587A>G;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Benign;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=MED25:81857;MC=SO:0001623|5_prime_UTR_variant;ORIGIN=1;RS=114843375        GT:AD:DP:GQ:PL  0/0:42,0:42:99:0,103,1320       0/0:48,0:48:99:0,120,1800  605589, 616449

How can I approach this?

Thank you,

Pola

sequencing gene • 745 views
ADD COMMENTlink modified 22 months ago • written 22 months ago by psmirin0

Hi @Pierre Lindenbaum,

I have generated a file through http://www.ensembl.org/biomart (like you suggested)

The file is :

19      49818279        49838816        MED25   610197  605589

19      49818279        49838816        MED25   610197  616449

My file:

chr19   50338843        rs193291405;220933      C       G       PASS    1       4.401e-03       0.602   MedGen:CN043576|MedGen:CN169374 Charcot-Marie-Tooth_disease,_type_2|not_specified       NC_000019.9:g.50338843C>G        Benign/Likely_benign    single_nucleotide_variant       1       MODERATE        MED25   NM_030973.3     protein_coding  NON_SYNONYMOUS_CODING   MISSENSE        gCc/gGc A576G   CODING  4.66    B9TX30,B5ME50,Q71SY5     N       D,D,D,D,T,D,D,D B,B,B   N       5.71    B,B,B   .       0/0     0/0      55,0    38,0

I wanted to get this output:

chr19   50338843        rs193291405;220933      C       G       PASS    1       4.401e-03       0.602   MedGen:CN043576|MedGen:CN169374 Charcot-Marie-Tooth_disease,_type_2|not_specified       NC_000019.9:g.50338843C>G        Benign/Likely_benign    single_nucleotide_variant       1       MODERATE        MED25   NM_030973.3     protein_coding  NON_SYNONYMOUS_CODING   MISSENSE        gCc/gGc A576G   CODING  4.66    B9TX30,B5ME50,Q71SY5     N       D,D,D,D,T,D,D,D B,B,B   N       5.71    B,B,B   .       0/0     0/0      55,0    38,0    **610197:605589:616449**

I have tried to run the following command:

    awk 'NR==FNR{a[$4]=$5":"$6;next}{print $0,"\t",a[$17]}' FS="\t" med25omim med25test

This is the output that I get:

chr19   50338843        rs193291405;220933      C       G       PASS    1       4.401e-03       0.602   MedGen:CN043576|MedGen:CN169374 Charcot-Marie-Tooth_disease,_type_2|not_specified       NC_000019.9:g.50338843C>G        Benign/Likely_benign    single_nucleotide_variant       1       MODERATE        MED25   NM_030973.3     protein_coding  NON_SYNONYMOUS_CODING   MISSENSE        gCc/gGc A576G   CODING  4.66    B9TX30,B5ME50,Q71SY5     N       D,D,D,D,T,D,D,D B,B,B   N       5.71    B,B,B   .       0/0     0/0      55,0    38,0     **610197:616449**

I am missing this omimID 605589 Can you please help me to overcome this? Thank you very much, Pola

ADD REPLYlink modified 22 months ago by Pierre Lindenbaum120k • written 22 months ago by psmirin0
1
gravatar for Pierre Lindenbaum
22 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

use biomart http://www.ensembl.org/biomart/martview/ to build a BED file containing

chrom, start,end, mim_gene_accession

remove the lines without "mim_gene_accession", substract '1' to the start column to switch to a 0-based reference, sort on chrom/start, compress with bgzip and index with tabix. Something like:

$ awk '/^Gene/ {next;} {if($4!="") printf("%s\t%d\t%s\t%s\n",$3,int($1)-1,$2,$4);}' biomart.txt | sort -t $'\t' -k1,1 -k2,2n | bgzip > omim.bed.gz
$ tabix -f -p bed omim.bed.gz

then use something like http://lindenb.github.io/jvarkit/VCFBed.html to transfer the information into the VCF:

java -jar dist/vcfbed.jar -B omim.bed.gz -f '${4}' -T OMIM_ACN your.vcf
ADD COMMENTlink written 22 months ago by Pierre Lindenbaum120k

Hi @Pierre Lindenbaum,

I have generated a file through http://www.ensembl.org/biomart (like you suggested)

The file is :

19      49818279        49838816        MED25   610197  605589

19      49818279        49838816        MED25   610197  616449

My file:

chr19   50338843        rs193291405;220933      C       G       PASS    1       4.401e-03       0.602   MedGen:CN043576|MedGen:CN169374 Charcot-Marie-Tooth_disease,_type_2|not_specified       NC_000019.9:g.50338843C>G        Benign/Likely_benign    single_nucleotide_variant       1       MODERATE        MED25   NM_030973.3     protein_coding  NON_SYNONYMOUS_CODING   MISSENSE        gCc/gGc A576G   CODING  4.66    B9TX30,B5ME50,Q71SY5     N       D,D,D,D,T,D,D,D B,B,B   N       5.71    B,B,B   .       0/0     0/0      55,0    38,0

I wanted to get this output:

chr19   50338843        rs193291405;220933      C       G       PASS    1       4.401e-03       0.602   MedGen:CN043576|MedGen:CN169374 Charcot-Marie-Tooth_disease,_type_2|not_specified       NC_000019.9:g.50338843C>G        Benign/Likely_benign    single_nucleotide_variant       1       MODERATE        MED25   NM_030973.3     protein_coding  NON_SYNONYMOUS_CODING   MISSENSE        gCc/gGc A576G   CODING  4.66    B9TX30,B5ME50,Q71SY5     N       D,D,D,D,T,D,D,D B,B,B   N       5.71    B,B,B   .       0/0     0/0      55,0    38,0    **610197:605589:616449**

I have tried to run the following command:

    awk 'NR==FNR{a[$4]=$5":"$6;next}{print $0,"\t",a[$17]}' FS="\t" med25omim med25test

This is the output that I get:

chr19   50338843        rs193291405;220933      C       G       PASS    1       4.401e-03       0.602   MedGen:CN043576|MedGen:CN169374 Charcot-Marie-Tooth_disease,_type_2|not_specified       NC_000019.9:g.50338843C>G        Benign/Likely_benign    single_nucleotide_variant       1       MODERATE        MED25   NM_030973.3     protein_coding  NON_SYNONYMOUS_CODING   MISSENSE        gCc/gGc A576G   CODING  4.66    B9TX30,B5ME50,Q71SY5     N       D,D,D,D,T,D,D,D B,B,B   N       5.71    B,B,B   .       0/0     0/0      55,0    38,0     **610197:616449**

I am missing this omimID 605589 Can you please help me to overcome this? Thank you very much, Pola

ADD REPLYlink written 22 months ago by psmirin0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1799 users visited in the last hour