Using VEP custom input
1
0
Entering edit mode
13 days ago
Sd • 0

I have a DataFrame with the following columns. Is there any way to use this file in VEP to include columns 6 to 12 in my variant annotation outputs?

    chr   start     end       transcript    gene   exp_snv  obs_snv      pLI      o/e  lof.oe_ci_lower  lof.oe_ci.upper         biotype  canonical  mane_select
0  chr1   65419   71585  ENST00000641515   OR4F5   0.53873      0.0  0.33668  0.00000        0.000            1.828  protein_coding       True         True
1  chr1  923923  944574  ENST00000616016  SAMD11  54.47500     75.0  0.00000  1.37680        1.204            1.764  protein_coding       True         True
2  chr1  923923  944574  ENST00000618323  SAMD11  54.13100     76.0  0.00000  1.40400        1.231            1.795  protein_coding      False        False
3  chr1  925150  935793  ENST00000437963  SAMD11  15.47900     15.0  0.00001  0.96906        0.483            1.272  protein_coding      False        False
4  chr1  925731  944574  ENST00000342066  SAMD11  55.24600     75.0  0.00000  1.35760        1.141            1.693  protein_coding      False        False
gnomAD constraint LOEUF VEP pLI • 418 views
ADD COMMENT
0
Entering edit mode

Initially, I converted the file to GFF3 format in order to run VEP, but I got warnings and couldn’t figure out what the problem was.

GFF3 format file which created to use ./vep --custom or --gff:

chr1    gnomAD.v4.1_constraint  transcript      65419   71585   .       +       .       ID=ENST00000641515;Name=OR4F5;Parent=OR4F5;exp_snv=0.53873;obs_snv=0.00000;pLI=0.33668;o/e=0.00000;lof.oe_ci_lower=0.0;lof.oe_ci.upper=1.828;biotype=protein_coding;canonical=True;mane_select=True
chr1    gnomAD.v4.1_constraint  transcript      923923  944574  .       +       .       ID=ENST00000616016;Name=SAMD11;Parent=SAMD11;exp_snv=54.47500;obs_snv=75.00000;pLI=0.00000;o/e=1.37680;lof.oe_ci_lower=1.204;lof.oe_ci.upper=1.764;biotype=protein_coding;canonical=True;mane_select=True
chr1    gnomAD.v4.1_constraint  transcript      923923  944574  .       +       .       ID=ENST00000618323;Name=SAMD11;Parent=SAMD11;exp_snv=54.13100;obs_snv=76.00000;pLI=0.00000;o/e=1.40400;lof.e_ci_lower=1.231;lof.oe_ci.upper=1.795;biotype=protein_coding;canonical=False;mane_select=False
chr1    gnomAD.v4.1_constraint  transcript      925150  935793  .       +       .       ID=ENST00000437963;Name=SAMD11;Parent=SAMD11;exp_snv=15.47900;obs_snv=15.00000;pLI=0.00001;lof.o/e=0.96906;oe_ci_lower=0.483;lof.oe_ci.upper=1.272;biotype=protein_coding;canonical=False;mane_select=False

But I am getting the following warnings like this:

WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: TBX1, GSC2, TSSK2, HIRA, DGCR2, CLDN5, GP1BB, SEPTIN5, nan, UFD1, TXNRD2, COMT, CDC45, ESS2, C22orf39, ARVCF, MRPL40, CLTCL1, SLC25A1, RTL10, GNB1L
WARNING: 167 : WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: CLTCL1, SLC25A1, MRPL40, RTL10, GNB1L, UFD1, nan, ESS2, C22orf39, ARVCF, COMT, TXNRD2, CDC45, SEPTIN5, HIRA, TSSK2, TBX1, GSC2, GP1BB, CLDN5, DGCR2
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: CLTCL1, SLC25A1, MRPL40, RTL10, GNB1L, UFD1, nan, ESS2, C22orf39, ARVCF, COMT, TXNRD2, CDC45, SEPTIN5, HIRA, TSSK2, TBX1, GSC2, GP1BB, CLDN5, DGCR2
WARNING: 218 : WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: THAP7, SCARF2, MED15, DGCR8, PI4KA, COMT, TXNRD2, CCDC188, ZDHHC8, SERPIND1, KLHL22, AIFM3, RANBP1, RTN4R, LZTR1, CRKL, TANGO2, ZNF74, ARVCF, SNAP29, nan, DGCR6L, TRMT2A, USP41
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: THAP7, SCARF2, MED15, DGCR8, PI4KA, COMT, TXNRD2, CCDC188, ZDHHC8, SERPIND1, KLHL22, AIFM3, RANBP1, RTN4R, LZTR1, CRKL, TANGO2, ZNF74, ARVCF, SNAP29, nan, DGCR6L, TRMT2A, USP41

Do you have any thoughts on this? Which format should I use to avoid the following warnings?

ADD REPLY
0
Entering edit mode
13 days ago

VEP can integrate custom annotation from standard format files into your results by using the --custom flag.

see https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html

( but I would just index the file with tabix and use bcftools annotate )

ADD COMMENT

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6