Question: Installing ensembl 86 VEP and vcf2maf and getting SNP MAFs
3
gravatar for eurioste
18 months ago by
eurioste20
eurioste20 wrote:

My final objective is to get the "Minor Allele Frequencies" (MAF) for all the 1000 Genomes SNPs (in H. sapiens GRCh37 in case you ask). I specifically need to obtain data referent to the low coverage Phase 1 of the project, as I require unbiased low coverage data for a machine learning model.

I have the 1000 Genomes vcf and I'm attempting to install both VEP 86 and vcf2maf for obtaining the data i need. The reason I wish to install VEP 86 (instead of the current version, 89) is because vcf2maf requires the archive version of VEP, I don't know how to make it work with the latest VEP version.

As pointed by this previous question www.biostars.org/p/123822/) I'm following the instructions from this link to get vcf2maf installed: vcf2maf

which points also to this VEP installation instructions: VEP

I successfully installed perl 5.22 in the path require by VEP, as described in this link bellow. This step is done. perl

I'm currently stuck at the following step of the VEP installation (again, see VEP ):

Download and unpack VEP's offline cache for GRCh37, GRCh38, and GRCm38:

> rsync -zvh rsync://ftp.ensembl.org/ensembl/pub/release-86/variation/VEP/homo_sapiens_vep_86_GRCh{37,38}.tar.gz $VEP_DATA 
> rsync -zvh rsync://ftp.ensembl.org/ensembl/pub/release-86/variation/VEP/mus_musculus_vep_86_GRCm38.tar.gz $VEP_DATA 
> cat $VEP_DATA/*_vep_86_GRC{h37,h38,m38}.tar.gz | tar -izxf - -C $VEP_DATA

I know the path given in the instructions is wrong. When I try it the code runs but hangs forever:

ftp.ensembl.org/ensembl/pub/release-86/variationVEP/homo_sapiens_vep_86_GRCh37.tar.gz

The current right path is bellow. Notice that I'm only interested in human GRCh37:

ftp.ensembl.org/pub/release-86/variation/VEP/homo_sapiens_vep_86_GRCh37.tar.gz

When I attempt to correct the line I get:

> rsync -zvh rsync://ftp.ensembl.org/pub/release-86/variation/VEP/homo_sapiens_vep_86_GRCh37.tar.gz $VEP_DATA
@ERROR: Unknown module 'pub'
rsync error: error starting client-server protocol (code 5) at main.c(1653) [Receiver=3.1.1]
sergio-bioinfo@sergiobioinfo-Latitude-3540:~/vep$ rsync -zvh rsync://ftp.ensembl.org/pub/release-86/variation/VEP/homo_sapiens_vep_86_GRCh37.tar.gz $VEP_DATA
@ERROR: Unknown module 'pub'
rsync error: error starting client-server protocol (code 5) at main.c(1653) [Receiver=3.1.1]

I don't know how to work around this problem. How can I fix this and follow the instructions correctly to get VEP and vcf2maf work together?

ensembl vcf2maf vep 1000genomes maf • 1.2k views
ADD COMMENTlink modified 11 weeks ago by zx87546.0k • written 18 months ago by eurioste20
1
gravatar for Cyriac Kandoth
15 months ago by
Cyriac Kandoth5.2k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.2k wrote:

For your final objective, you should not use vcf2maf. The "MAF" in vcf2maf refers to "Mutation Annotation Format", which was something unncessarily invented and confusingly named for cancer genetics. This possible confusion was already disambiguated in the post that you pointed to.

To reach your final objective, please use my answer to your previous post here - Getting 1000 Genomes phase one MAF values

For the benefit of users that ran into your VEP installation issues:

  1. The path given in the instructions is for the rsync protocol, not for ftp. Read more about this at this link.
  2. The rsync step is supposed to be slow, and it will take a long time. The VEP caches for GRCh37 and GRCh38 are almost 5GB each, and Ensembl's servers can be slow. The advantage of using rsync is that it can resume partial downloads that were aborted by impatient users.
ADD COMMENTlink written 15 months ago by Cyriac Kandoth5.2k
1

The "MAF" in vcf2maf refers to "Mutation Annotation Format", which was something unncessarily invented and confusingly named for cancer genetics.

So true! This also conflicts with Multiple Alignment Format

ADD REPLYlink modified 15 months ago • written 15 months ago by poisonAlien2.6k
1

And Minor Allele Frequency!

ADD REPLYlink written 14 months ago by ionox0280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour