Question: VEP output has no gene names
1
gravatar for banerjeeshayantan
6 months ago by
banerjeeshayantan80 wrote:

I am trying to annotate a variant file(generated using strelka) from mice WGS data. This is the command I used:

./vep -i /path/to/somatic.snvs.vcf \
        --cache /data/shayantan/mus_musculus/ \
        --species mus_musculus

The output variant file has no gene names. Why is this happening? Something wrong with my cache files?

EDIT (@Ram): Sample input VCF:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr1    3003110 .   G   T   .   LowEVS  SOMATIC;QSS=17;TQSS=1;NT=ref;QSS_NT=17;TQSS_NT=1;SGT=GG->GT;DP=35;MQ=60.00;MQ0=0;ReadPosRankSum=1.95;SNVSB=3.58;SomaticEVS=0.80 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:0,0:17,17:1,1  14:0:0:0:0,0:0,0:11,13:3,4
chr1    3035137 .   G   A   .   LowEVS  SOMATIC;QSS=17;TQSS=2;NT=ref;QSS_NT=17;TQSS_NT=2;SGT=GG->AG;DP=70;MQ=40.40;MQ0=10;ReadPosRankSum=0.89;SNVSB=3.23;SomaticEVS=0.10    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    27:0:0:0:3,7:0,0:24,28:0,0  27:0:0:0:4,6:0,0:23,29:0,0
chr1    3035168 .   C   T   .   LowEVS  SOMATIC;QSS=8;TQSS=2;NT=ref;QSS_NT=8;TQSS_NT=2;SGT=CC->CT;DP=51;MQ=47.72;MQ0=3;ReadPosRankSum=1.78;SNVSB=2.68;SomaticEVS=0.08   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:16,19:0,0:2,4  23:0:0:0:0,0:20,25:0,0:3,3
chr1    3035504 .   C   A   .   LowEVS  SOMATIC;QSS=15;TQSS=2;NT=ref;QSS_NT=14;TQSS_NT=2;SGT=CC->AC;DP=59;MQ=51.03;MQ0=2;ReadPosRankSum=-1.19;SNVSB=2.71;SomaticEVS=0.09    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    23:0:0:0:3,5:20,22:0,0:0,0  27:0:0:0:4,4:23,28:0,0:0,0
chr1    3043000 .   G   T   .   LowEVS  SOMATIC;QSS=21;TQSS=1;NT=ref;QSS_NT=21;TQSS_NT=1;SGT=GG->GT;DP=53;MQ=46.60;MQ0=7;ReadPosRankSum=1.70;SNVSB=1.37;SomaticEVS=0.20 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    20:0:0:0:0,0:0,0:18,24:2,3  22:0:0:0:0,0:0,0:18,22:4,4
sequencing vep alignment • 469 views
ADD COMMENTlink modified 6 months ago by RamRS20k • written 6 months ago by banerjeeshayantan80

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

You're using vep. A vep tag would help your cause.

ADD REPLYlink modified 6 months ago • written 6 months ago by RamRS20k

Thanks for editing my code. I will surely keep this in mind for future posts.

ADD REPLYlink written 6 months ago by banerjeeshayantan80

I think you need to add the option --symbol to the command.

ADD REPLYlink written 6 months ago by 1629160

Thanks. But even after including --symbol, I am getting no gene names

ADD REPLYlink written 6 months ago by banerjeeshayantan80

instead of cache, can you run the code with db option for few selected variants? @ banerjeeshayantan

ADD REPLYlink written 6 months ago by cpad011211k

Please can you show us a sample of your input file.

ADD REPLYlink written 6 months ago by Emily_Ensembl17k
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr1    3003110 .   G   T   .   LowEVS  SOMATIC;QSS=17;TQSS=1;NT=ref;QSS_NT=17;TQSS_NT=1;SGT=GG->GT;DP=35;MQ=60.00;MQ0=0;ReadPosRankSum=1.95;SNVSB=3.58;SomaticEVS=0.80 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:0,0:17,17:1,1  14:0:0:0:0,0:0,0:11,13:3,4
chr1    3035137 .   G   A   .   LowEVS  SOMATIC;QSS=17;TQSS=2;NT=ref;QSS_NT=17;TQSS_NT=2;SGT=GG->AG;DP=70;MQ=40.40;MQ0=10;ReadPosRankSum=0.89;SNVSB=3.23;SomaticEVS=0.10    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    27:0:0:0:3,7:0,0:24,28:0,0  27:0:0:0:4,6:0,0:23,29:0,0
chr1    3035168 .   C   T   .   LowEVS  SOMATIC;QSS=8;TQSS=2;NT=ref;QSS_NT=8;TQSS_NT=2;SGT=CC->CT;DP=51;MQ=47.72;MQ0=3;ReadPosRankSum=1.78;SNVSB=2.68;SomaticEVS=0.08   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:16,19:0,0:2,4  23:0:0:0:0,0:20,25:0,0:3,3
chr1    3035504 .   C   A   .   LowEVS  SOMATIC;QSS=15;TQSS=2;NT=ref;QSS_NT=14;TQSS_NT=2;SGT=CC->AC;DP=59;MQ=51.03;MQ0=2;ReadPosRankSum=-1.19;SNVSB=2.71;SomaticEVS=0.09    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    23:0:0:0:3,5:20,22:0,0:0,0  27:0:0:0:4,4:23,28:0,0:0,0
chr1    3043000 .   G   T   .   LowEVS  SOMATIC;QSS=21;TQSS=1;NT=ref;QSS_NT=21;TQSS_NT=1;SGT=GG->GT;DP=53;MQ=46.60;MQ0=7;ReadPosRankSum=1.70;SNVSB=1.37;SomaticEVS=0.20 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    20:0:0:0:0,0:0,0:18,24:2,3  22:0:0:0:0,0:0,0:18,22:4,4
ADD REPLYlink modified 6 months ago by RamRS20k • written 6 months ago by banerjeeshayantan80

Hey! You said you'd keep the editing tip in mind for future posts. Use the code formatting to your advantage, man :-)

ADD REPLYlink written 6 months ago by RamRS20k

This is so embarrassing. I was in a hurry and so couldn't format it. I will surely follow the site's guidelines form the next post.

ADD REPLYlink written 6 months ago by banerjeeshayantan80
5
gravatar for Emily_Ensembl
6 months ago by
Emily_Ensembl17k
EMBL-EBI
Emily_Ensembl17k wrote:

Those variants are all intergenic. There is no gene symbol because no genes are hit.

EDIT (@genomax) - Actual answer is further below in this chain at C: VEP output has no gene names

ADD COMMENTlink modified 6 months ago by genomax62k • written 6 months ago by Emily_Ensembl17k
chr1    3930912 .   G   A   .   LowEVS  SOMATIC;QSS=2;TQSS=1;NT=ref;QSS_NT=2;TQSS_NT=1;SGT=GG->GG;DP=26;MQ=58.54;MQ0=0;ReadPosRankSum=-1.85;SNVSB=1.53;SomaticEVS=1.26;ANN=A|intergenic_region|MODIFIER|Xkr4-Rp1|Xkr4-Rp1|intergenic_region|Xkr4-Rp1|||n.3930912G>A||||||   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    16:0:0:0:0,0:0,0:16,16:0,0  9:3:0:0:2,5:0,0:4,5:0,0

Here in spite of being an intergenic variant, it has a gene name. I am confused.

ADD REPLYlink modified 6 months ago • written 6 months ago by banerjeeshayantan80
1

Are you using an up-to-date version of the VEP? What is your reference genome? That variant is coming up as intronic to a lincRNA for me.

ADD REPLYlink written 6 months ago by Emily_Ensembl17k

My reference genome is mm9. I am using an older reference as the bam files were aligned and the variants were called using this reference genome only. The vep version is ensemble-vep 93.2

ADD REPLYlink written 6 months ago by banerjeeshayantan80

Are you using the Ensembl release 67 cache files? Or your own custom cache? What was in the input line for that variant?

ADD REPLYlink written 6 months ago by Emily_Ensembl17k

I used the cache file from this page under the mouse genome and column known as "variation vep". This resulted in opening of a page from where I downloaded the mus_musculus_vep_93_GRCm38.tar.gz file.Did I do anything wrong?

ADD REPLYlink written 6 months ago by banerjeeshayantan80
2

That cache is GRCm38 (mm10). You're using NCBI37 (mm9). Of course it doesn't work.

ADD REPLYlink written 6 months ago by Emily_Ensembl17k

Thanks for pointing it out. If possible, Can you please direct me to the appropriate link?

ADD REPLYlink modified 6 months ago • written 6 months ago by banerjeeshayantan80
2

This is not the easiest problem. You will need to use the NCBIm37 cache from Ensembl 67, which will probably not work with the current VEP, and will work best with VEP 67.

VEP 67

NCBIm37 cache

Your alternative would be to run your VCF files through the Ensembl Assembly Converter to get them onto GRCm38, but be aware that you may lose some data this way.

ADD REPLYlink modified 6 months ago • written 6 months ago by Emily_Ensembl17k

Thanks for suggesting the steps. I will try it out. This really helped! Thanks again.

ADD REPLYlink written 6 months ago by banerjeeshayantan80

Please accept Emily's answer.

ADD REPLYlink written 6 months ago by RamRS20k

Hi again. The VEP 67 version link is down. Are you aware of any active links?

ADD REPLYlink written 6 months ago by banerjeeshayantan80

Not down for me - I can access the link.

ADD REPLYlink written 6 months ago by RamRS20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1314 users visited in the last hour