Question: VEP output has no gene names
1
gravatar for banerjeeshayantan
10 months ago by
banerjeeshayantan110 wrote:

I am trying to annotate a variant file(generated using strelka) from mice WGS data. This is the command I used:

./vep -i /path/to/somatic.snvs.vcf \
        --cache /data/shayantan/mus_musculus/ \
        --species mus_musculus

The output variant file has no gene names. Why is this happening? Something wrong with my cache files?

EDIT (@Ram): Sample input VCF:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr1    3003110 .   G   T   .   LowEVS  SOMATIC;QSS=17;TQSS=1;NT=ref;QSS_NT=17;TQSS_NT=1;SGT=GG->GT;DP=35;MQ=60.00;MQ0=0;ReadPosRankSum=1.95;SNVSB=3.58;SomaticEVS=0.80 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:0,0:17,17:1,1  14:0:0:0:0,0:0,0:11,13:3,4
chr1    3035137 .   G   A   .   LowEVS  SOMATIC;QSS=17;TQSS=2;NT=ref;QSS_NT=17;TQSS_NT=2;SGT=GG->AG;DP=70;MQ=40.40;MQ0=10;ReadPosRankSum=0.89;SNVSB=3.23;SomaticEVS=0.10    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    27:0:0:0:3,7:0,0:24,28:0,0  27:0:0:0:4,6:0,0:23,29:0,0
chr1    3035168 .   C   T   .   LowEVS  SOMATIC;QSS=8;TQSS=2;NT=ref;QSS_NT=8;TQSS_NT=2;SGT=CC->CT;DP=51;MQ=47.72;MQ0=3;ReadPosRankSum=1.78;SNVSB=2.68;SomaticEVS=0.08   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:16,19:0,0:2,4  23:0:0:0:0,0:20,25:0,0:3,3
chr1    3035504 .   C   A   .   LowEVS  SOMATIC;QSS=15;TQSS=2;NT=ref;QSS_NT=14;TQSS_NT=2;SGT=CC->AC;DP=59;MQ=51.03;MQ0=2;ReadPosRankSum=-1.19;SNVSB=2.71;SomaticEVS=0.09    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    23:0:0:0:3,5:20,22:0,0:0,0  27:0:0:0:4,4:23,28:0,0:0,0
chr1    3043000 .   G   T   .   LowEVS  SOMATIC;QSS=21;TQSS=1;NT=ref;QSS_NT=21;TQSS_NT=1;SGT=GG->GT;DP=53;MQ=46.60;MQ0=7;ReadPosRankSum=1.70;SNVSB=1.37;SomaticEVS=0.20 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    20:0:0:0:0,0:0,0:18,24:2,3  22:0:0:0:0,0:0,0:18,22:4,4
sequencing vep alignment • 567 views
ADD COMMENTlink modified 10 months ago by RamRS21k • written 10 months ago by banerjeeshayantan110

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

You're using vep. A vep tag would help your cause.

ADD REPLYlink modified 10 months ago • written 10 months ago by RamRS21k

Thanks for editing my code. I will surely keep this in mind for future posts.

ADD REPLYlink written 10 months ago by banerjeeshayantan110

I think you need to add the option --symbol to the command.

ADD REPLYlink written 10 months ago by 1629160

Thanks. But even after including --symbol, I am getting no gene names

ADD REPLYlink written 10 months ago by banerjeeshayantan110

instead of cache, can you run the code with db option for few selected variants? @ banerjeeshayantan

ADD REPLYlink written 10 months ago by cpad011211k

Please can you show us a sample of your input file.

ADD REPLYlink written 10 months ago by Emily_Ensembl18k
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr1    3003110 .   G   T   .   LowEVS  SOMATIC;QSS=17;TQSS=1;NT=ref;QSS_NT=17;TQSS_NT=1;SGT=GG->GT;DP=35;MQ=60.00;MQ0=0;ReadPosRankSum=1.95;SNVSB=3.58;SomaticEVS=0.80 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:0,0:17,17:1,1  14:0:0:0:0,0:0,0:11,13:3,4
chr1    3035137 .   G   A   .   LowEVS  SOMATIC;QSS=17;TQSS=2;NT=ref;QSS_NT=17;TQSS_NT=2;SGT=GG->AG;DP=70;MQ=40.40;MQ0=10;ReadPosRankSum=0.89;SNVSB=3.23;SomaticEVS=0.10    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    27:0:0:0:3,7:0,0:24,28:0,0  27:0:0:0:4,6:0,0:23,29:0,0
chr1    3035168 .   C   T   .   LowEVS  SOMATIC;QSS=8;TQSS=2;NT=ref;QSS_NT=8;TQSS_NT=2;SGT=CC->CT;DP=51;MQ=47.72;MQ0=3;ReadPosRankSum=1.78;SNVSB=2.68;SomaticEVS=0.08   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:16,19:0,0:2,4  23:0:0:0:0,0:20,25:0,0:3,3
chr1    3035504 .   C   A   .   LowEVS  SOMATIC;QSS=15;TQSS=2;NT=ref;QSS_NT=14;TQSS_NT=2;SGT=CC->AC;DP=59;MQ=51.03;MQ0=2;ReadPosRankSum=-1.19;SNVSB=2.71;SomaticEVS=0.09    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    23:0:0:0:3,5:20,22:0,0:0,0  27:0:0:0:4,4:23,28:0,0:0,0
chr1    3043000 .   G   T   .   LowEVS  SOMATIC;QSS=21;TQSS=1;NT=ref;QSS_NT=21;TQSS_NT=1;SGT=GG->GT;DP=53;MQ=46.60;MQ0=7;ReadPosRankSum=1.70;SNVSB=1.37;SomaticEVS=0.20 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    20:0:0:0:0,0:0,0:18,24:2,3  22:0:0:0:0,0:0,0:18,22:4,4
ADD REPLYlink modified 10 months ago by RamRS21k • written 10 months ago by banerjeeshayantan110

Hey! You said you'd keep the editing tip in mind for future posts. Use the code formatting to your advantage, man :-)

ADD REPLYlink written 10 months ago by RamRS21k

This is so embarrassing. I was in a hurry and so couldn't format it. I will surely follow the site's guidelines form the next post.

ADD REPLYlink written 10 months ago by banerjeeshayantan110
5
gravatar for Emily_Ensembl
10 months ago by
Emily_Ensembl18k
EMBL-EBI
Emily_Ensembl18k wrote:

Those variants are all intergenic. There is no gene symbol because no genes are hit.

EDIT (@genomax) - Actual answer is further below in this chain at C: VEP output has no gene names

ADD COMMENTlink modified 10 months ago by genomax68k • written 10 months ago by Emily_Ensembl18k
chr1    3930912 .   G   A   .   LowEVS  SOMATIC;QSS=2;TQSS=1;NT=ref;QSS_NT=2;TQSS_NT=1;SGT=GG->GG;DP=26;MQ=58.54;MQ0=0;ReadPosRankSum=-1.85;SNVSB=1.53;SomaticEVS=1.26;ANN=A|intergenic_region|MODIFIER|Xkr4-Rp1|Xkr4-Rp1|intergenic_region|Xkr4-Rp1|||n.3930912G>A||||||   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    16:0:0:0:0,0:0,0:16,16:0,0  9:3:0:0:2,5:0,0:4,5:0,0

Here in spite of being an intergenic variant, it has a gene name. I am confused.

ADD REPLYlink modified 10 months ago • written 10 months ago by banerjeeshayantan110
1

Are you using an up-to-date version of the VEP? What is your reference genome? That variant is coming up as intronic to a lincRNA for me.

ADD REPLYlink written 10 months ago by Emily_Ensembl18k

My reference genome is mm9. I am using an older reference as the bam files were aligned and the variants were called using this reference genome only. The vep version is ensemble-vep 93.2

ADD REPLYlink written 10 months ago by banerjeeshayantan110

Are you using the Ensembl release 67 cache files? Or your own custom cache? What was in the input line for that variant?

ADD REPLYlink written 10 months ago by Emily_Ensembl18k

I used the cache file from this page under the mouse genome and column known as "variation vep". This resulted in opening of a page from where I downloaded the mus_musculus_vep_93_GRCm38.tar.gz file.Did I do anything wrong?

ADD REPLYlink written 10 months ago by banerjeeshayantan110
2

That cache is GRCm38 (mm10). You're using NCBI37 (mm9). Of course it doesn't work.

ADD REPLYlink written 10 months ago by Emily_Ensembl18k

Thanks for pointing it out. If possible, Can you please direct me to the appropriate link?

ADD REPLYlink modified 10 months ago • written 10 months ago by banerjeeshayantan110
2

This is not the easiest problem. You will need to use the NCBIm37 cache from Ensembl 67, which will probably not work with the current VEP, and will work best with VEP 67.

VEP 67

NCBIm37 cache

Your alternative would be to run your VCF files through the Ensembl Assembly Converter to get them onto GRCm38, but be aware that you may lose some data this way.

ADD REPLYlink modified 10 months ago • written 10 months ago by Emily_Ensembl18k

Thanks for suggesting the steps. I will try it out. This really helped! Thanks again.

ADD REPLYlink written 10 months ago by banerjeeshayantan110

Please accept Emily's answer.

ADD REPLYlink written 10 months ago by RamRS21k

Hi again. The VEP 67 version link is down. Are you aware of any active links?

ADD REPLYlink written 10 months ago by banerjeeshayantan110

Not down for me - I can access the link.

ADD REPLYlink written 10 months ago by RamRS21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 666 users visited in the last hour