Question: VEP output has no gene names
1
gravatar for banerjeeshayantan
2.6 years ago by
banerjeeshayantan190 wrote:

I am trying to annotate a variant file(generated using strelka) from mice WGS data. This is the command I used:

./vep -i /path/to/somatic.snvs.vcf \
        --cache /data/shayantan/mus_musculus/ \
        --species mus_musculus

The output variant file has no gene names. Why is this happening? Something wrong with my cache files?

EDIT (@Ram): Sample input VCF:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr1    3003110 .   G   T   .   LowEVS  SOMATIC;QSS=17;TQSS=1;NT=ref;QSS_NT=17;TQSS_NT=1;SGT=GG->GT;DP=35;MQ=60.00;MQ0=0;ReadPosRankSum=1.95;SNVSB=3.58;SomaticEVS=0.80 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:0,0:17,17:1,1  14:0:0:0:0,0:0,0:11,13:3,4
chr1    3035137 .   G   A   .   LowEVS  SOMATIC;QSS=17;TQSS=2;NT=ref;QSS_NT=17;TQSS_NT=2;SGT=GG->AG;DP=70;MQ=40.40;MQ0=10;ReadPosRankSum=0.89;SNVSB=3.23;SomaticEVS=0.10    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    27:0:0:0:3,7:0,0:24,28:0,0  27:0:0:0:4,6:0,0:23,29:0,0
chr1    3035168 .   C   T   .   LowEVS  SOMATIC;QSS=8;TQSS=2;NT=ref;QSS_NT=8;TQSS_NT=2;SGT=CC->CT;DP=51;MQ=47.72;MQ0=3;ReadPosRankSum=1.78;SNVSB=2.68;SomaticEVS=0.08   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:16,19:0,0:2,4  23:0:0:0:0,0:20,25:0,0:3,3
chr1    3035504 .   C   A   .   LowEVS  SOMATIC;QSS=15;TQSS=2;NT=ref;QSS_NT=14;TQSS_NT=2;SGT=CC->AC;DP=59;MQ=51.03;MQ0=2;ReadPosRankSum=-1.19;SNVSB=2.71;SomaticEVS=0.09    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    23:0:0:0:3,5:20,22:0,0:0,0  27:0:0:0:4,4:23,28:0,0:0,0
chr1    3043000 .   G   T   .   LowEVS  SOMATIC;QSS=21;TQSS=1;NT=ref;QSS_NT=21;TQSS_NT=1;SGT=GG->GT;DP=53;MQ=46.60;MQ0=7;ReadPosRankSum=1.70;SNVSB=1.37;SomaticEVS=0.20 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    20:0:0:0:0,0:0,0:18,24:2,3  22:0:0:0:0,0:0,0:18,22:4,4
sequencing vep alignment • 1.3k views
ADD COMMENTlink modified 2.6 years ago by Ram32k • written 2.6 years ago by banerjeeshayantan190

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

You're using vep. A vep tag would help your cause.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Ram32k

Thanks for editing my code. I will surely keep this in mind for future posts.

ADD REPLYlink written 2.6 years ago by banerjeeshayantan190

I think you need to add the option --symbol to the command.

ADD REPLYlink written 2.6 years ago by 1629160

Thanks. But even after including --symbol, I am getting no gene names

ADD REPLYlink written 2.6 years ago by banerjeeshayantan190

instead of cache, can you run the code with db option for few selected variants? @ banerjeeshayantan

ADD REPLYlink written 2.6 years ago by cpad011215k

Please can you show us a sample of your input file.

ADD REPLYlink written 2.6 years ago by Emily_Ensembl21k
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr1    3003110 .   G   T   .   LowEVS  SOMATIC;QSS=17;TQSS=1;NT=ref;QSS_NT=17;TQSS_NT=1;SGT=GG->GT;DP=35;MQ=60.00;MQ0=0;ReadPosRankSum=1.95;SNVSB=3.58;SomaticEVS=0.80 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:0,0:17,17:1,1  14:0:0:0:0,0:0,0:11,13:3,4
chr1    3035137 .   G   A   .   LowEVS  SOMATIC;QSS=17;TQSS=2;NT=ref;QSS_NT=17;TQSS_NT=2;SGT=GG->AG;DP=70;MQ=40.40;MQ0=10;ReadPosRankSum=0.89;SNVSB=3.23;SomaticEVS=0.10    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    27:0:0:0:3,7:0,0:24,28:0,0  27:0:0:0:4,6:0,0:23,29:0,0
chr1    3035168 .   C   T   .   LowEVS  SOMATIC;QSS=8;TQSS=2;NT=ref;QSS_NT=8;TQSS_NT=2;SGT=CC->CT;DP=51;MQ=47.72;MQ0=3;ReadPosRankSum=1.78;SNVSB=2.68;SomaticEVS=0.08   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:16,19:0,0:2,4  23:0:0:0:0,0:20,25:0,0:3,3
chr1    3035504 .   C   A   .   LowEVS  SOMATIC;QSS=15;TQSS=2;NT=ref;QSS_NT=14;TQSS_NT=2;SGT=CC->AC;DP=59;MQ=51.03;MQ0=2;ReadPosRankSum=-1.19;SNVSB=2.71;SomaticEVS=0.09    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    23:0:0:0:3,5:20,22:0,0:0,0  27:0:0:0:4,4:23,28:0,0:0,0
chr1    3043000 .   G   T   .   LowEVS  SOMATIC;QSS=21;TQSS=1;NT=ref;QSS_NT=21;TQSS_NT=1;SGT=GG->GT;DP=53;MQ=46.60;MQ0=7;ReadPosRankSum=1.70;SNVSB=1.37;SomaticEVS=0.20 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    20:0:0:0:0,0:0,0:18,24:2,3  22:0:0:0:0,0:0,0:18,22:4,4
ADD REPLYlink modified 2.6 years ago by Ram32k • written 2.6 years ago by banerjeeshayantan190

Hey! You said you'd keep the editing tip in mind for future posts. Use the code formatting to your advantage, man :-)

ADD REPLYlink written 2.6 years ago by Ram32k

This is so embarrassing. I was in a hurry and so couldn't format it. I will surely follow the site's guidelines form the next post.

ADD REPLYlink written 2.6 years ago by banerjeeshayantan190
5
gravatar for Emily_Ensembl
2.6 years ago by
Emily_Ensembl21k
EMBL-EBI
Emily_Ensembl21k wrote:

Those variants are all intergenic. There is no gene symbol because no genes are hit.

EDIT (@genomax) - Actual answer is further below in this chain at C: VEP output has no gene names

ADD COMMENTlink modified 2.6 years ago by GenoMax96k • written 2.6 years ago by Emily_Ensembl21k
chr1    3930912 .   G   A   .   LowEVS  SOMATIC;QSS=2;TQSS=1;NT=ref;QSS_NT=2;TQSS_NT=1;SGT=GG->GG;DP=26;MQ=58.54;MQ0=0;ReadPosRankSum=-1.85;SNVSB=1.53;SomaticEVS=1.26;ANN=A|intergenic_region|MODIFIER|Xkr4-Rp1|Xkr4-Rp1|intergenic_region|Xkr4-Rp1|||n.3930912G>A||||||   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    16:0:0:0:0,0:0,0:16,16:0,0  9:3:0:0:2,5:0,0:4,5:0,0

Here in spite of being an intergenic variant, it has a gene name. I am confused.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by banerjeeshayantan190
1

Are you using an up-to-date version of the VEP? What is your reference genome? That variant is coming up as intronic to a lincRNA for me.

ADD REPLYlink written 2.6 years ago by Emily_Ensembl21k

My reference genome is mm9. I am using an older reference as the bam files were aligned and the variants were called using this reference genome only. The vep version is ensemble-vep 93.2

ADD REPLYlink written 2.6 years ago by banerjeeshayantan190

Are you using the Ensembl release 67 cache files? Or your own custom cache? What was in the input line for that variant?

ADD REPLYlink written 2.6 years ago by Emily_Ensembl21k

I used the cache file from this page under the mouse genome and column known as "variation vep". This resulted in opening of a page from where I downloaded the mus_musculus_vep_93_GRCm38.tar.gz file.Did I do anything wrong?

ADD REPLYlink written 2.6 years ago by banerjeeshayantan190
2

That cache is GRCm38 (mm10). You're using NCBI37 (mm9). Of course it doesn't work.

ADD REPLYlink written 2.6 years ago by Emily_Ensembl21k

Thanks for pointing it out. If possible, Can you please direct me to the appropriate link?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by banerjeeshayantan190
2

This is not the easiest problem. You will need to use the NCBIm37 cache from Ensembl 67, which will probably not work with the current VEP, and will work best with VEP 67.

VEP 67

NCBIm37 cache

Your alternative would be to run your VCF files through the Ensembl Assembly Converter to get them onto GRCm38, but be aware that you may lose some data this way.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Emily_Ensembl21k

Thanks for suggesting the steps. I will try it out. This really helped! Thanks again.

ADD REPLYlink written 2.6 years ago by banerjeeshayantan190

Please accept Emily's answer.

ADD REPLYlink written 2.6 years ago by Ram32k

Hi again. The VEP 67 version link is down. Are you aware of any active links?

ADD REPLYlink written 2.6 years ago by banerjeeshayantan190

Not down for me - I can access the link.

ADD REPLYlink written 2.6 years ago by Ram32k

Hello Emily! I am working with the cattle genome. How can I add gene name in vep output?

ADD REPLYlink written 7 days ago by brianaloredana0
1

Use --symbol with the offline VEP. Online, tick Gene symbol (should be selected by default). With the REST API it should be enabled by default.

ADD REPLYlink written 7 days ago by Emily_Ensembl21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2243 users visited in the last hour
_