Question: Obtain MAF from VCF
1
gravatar for niubbo
5.3 years ago by
niubbo10
Italy
niubbo10 wrote:

Hello everybody,

I need to obtain, from a vcf file, a tabular file (to view with excel or libreoffice calc) containing SNP\variant positions, and minor frequency allele.

Is it possible?

excel tabular vcf maf • 4.7k views
ADD COMMENTlink modified 4.2 years ago by alexisdereeper30 • written 5.3 years ago by niubbo10
2
gravatar for EnsemblWill
5.3 years ago by
EnsemblWill560
United Kingdom
EnsemblWill560 wrote:

Use the VEP's web interface if you're not tied to the command line:

http://www.ensembl.org/Homo_sapiens/Tools/VEP

or http://grch37.ensembl.org/Homo_sapiens/Tools/VEP if your data is on GRCh37.

See http://www.ensembl.org/info/docs/tools/vep/online/input.html#ident for more details.

ADD COMMENTlink written 5.3 years ago by EnsemblWill560

I installed it but ... you're right when you say you are not tied to the command line ...

The Web version is very useful although I think my PC run much faster

thank you

 

EDIT: I'm sorry there is another problem....web version doesn't accept file larger than 20Mb mine is 96Mb..

 

so I think I have to install vep on computer. I moved to directory using "cd" command and than this is the command line I used:

perl INSTALL.pl

I saw installation going ahead and there is folder...i can also find path bio/ensEMBL/Variation/utils and file "sequence.pm" that it cant find ( see message error i posted)

 

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by niubbo10

You can compress your VCF file using Gzip or zip to reduce the size of you upload.

You may also create a user account in Ensembl (easy, takes only a minute and your email address) to increase your upload limit to 50MB.

If the INSTALL.pl script succeeded then you should be able to run:

perl variant_effect_predictor.pl --help

and see the help message. If you asked the installer to install to a different directory, you should either add this to your PERL5LIB or include it with:

perl -I /my/custom/directory/ variant_effect_predictor.pl --help

ADD REPLYlink written 5.3 years ago by EnsemblWill560

I try to load tar.gz file but  it show me this error:

The input format is invalid: the format is not recognized or there is a formatting issue in the input

 

ADD REPLYlink written 5.3 years ago by niubbo10
1

The interface does not accept tar.gz files, only .gz files:

> ls *.vcf
my_data.vcf
> gzip my_data.vcf
> ls *.gz
my_data.vcf.gz

You should be able to upload this .gz file to the web interface.

ADD REPLYlink written 5.2 years ago by EnsemblWill560

ok, at the moments it is working...yesterday I had problem with .gz file too.

 

I think now I have all what I need, I'll try to understand how to install vep tool while my analysis is running on web tool.

 

thanks for help!

 

 

ADD REPLYlink written 5.2 years ago by niubbo10

Hello, I need just one more help with VEP......

 

is it possible from a vcf file obtain the single gene rather than a list of transcripts?

ADD REPLYlink written 5.2 years ago by niubbo10

Yes:

a) using the script, see http://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick

b) using the web interface, choose one of the options from the dropdown labelled "Restrict results" under the "Filtering options" section (they correspond to most of the flags docs in a)), see http://www.ensembl.org/info/docs/tools/vep/online/input.html#filter

ADD REPLYlink written 5.2 years ago by EnsemblWill560

Perhaps, I didn't explain myself properly:

If I have understand, what  you suggest permits to find a specific gene from a genes list. Instead I need this list: I have loaded a VCF file and filtered it. As result it show me for every gene a list of transcripts, (so I have about 120.000 transcripts) but I would know just the list of the genes presenting SNPs so I can use it in any other program (for study pathways for example)

ADD REPLYlink written 5.2 years ago by niubbo10

I think I did understand OK, those options will do as you request.

How about you give it a try and then if it doesn't work out, ask again.

ADD REPLYlink written 5.2 years ago by EnsemblWill560

It's what I did.....but I was trying on results page instead the page which permits to set-up a new work.

Thank you

ADD REPLYlink written 5.2 years ago by niubbo10
1
gravatar for Cyriac Kandoth
5.3 years ago by
Cyriac Kandoth5.5k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.5k wrote:

You can use vcf2maf, though "MAF" here refers to Mutation Annotation Format. It's a tab-delimited format with 34 columns, but vcf2maf runs Ensembl's VEP to add a bunch of additional useful columns including the following contain MAFs (Minor Allele Freqs):

77. GMAF - minor allele and frequency in 1000 Genomes Phase 1
78. AFR_MAF - minor allele:frequency in 1000 Genomes Phase 1 African population
79. AMR_MAF - minor allele:frequency in 1000 Genomes Phase 1 American population
80. ASN_MAF - minor allele:frequency in 1000 Genomes Phase 1 Asian population
81. EUR_MAF - minor allele:frequency in 1000 Genomes Phase 1 European population
82. AA_MAF - minor allele:frequency in NHLBI-ESP African American population
83. EA_MAF - minor allele:frequency in NHLBI-ESP European American population
ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Cyriac Kandoth5.5k

I try to use your program in past but it lacks some VEP dependency or something else....cant remember, however I'll try again tomorrow and I will let you know

 

EDIT:

ERROR: Cannot find VEP script variant_effect_predictor.pl in path: ~/vep

OK...i have variant_effect_predictor.pl installed with vep tool, is it possible copy and paste it in a folder to help vcf2maf to find it?

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by niubbo10

Read the docs. It had options to let you specify where vep is installed (--vep-path), and where the vep cache is dumped (--vep-data).

ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by Cyriac Kandoth5.5k
0
gravatar for Giovanni M Dall'Olio
5.3 years ago by
London, UK
Giovanni M Dall'Olio27k wrote:

Just use the --freq option in vcftools:

vcftools --gzvcf ALL.chr22.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz  --freq --out chr22

 

 

Output:

CHROM   POS     N_ALLELES       N_CHR   {ALLELE:FREQ}
22      16050075        2       5008    A:0.9998        G:0.000199681
22      16050115        2       5008    G:0.99361       A:0.00638978
22      16050213        2       5008    C:0.992412      T:0.00758786
22      16050319        2       5008    C:0.9998        T:0.000199681
22      16050527        2       5008    C:0.9998        A:0.000199681
22      16050568        2       5008    C:0.999601      A:0.000399361
22      16050607        2       5008    G:0.999002      A:0.000998403
22      16050627        2       5008    G:0.999601      T:0.000399361
22      16050646        2       5008    G:0.9998        T:0.000199681
22      16050654        5       5008    A:0.857228      <CN0>:0.00179712        <CN2>:0.0173722 <CN3>:0.119609  <CN4>:0.00399361
22      16050655        2       5008    G:0.9998        A:0.000199681
22      16050678        2       5008    C:0.999601      T:0.000399361
22      16050679        2       5008    G:0.9998        A:0.000199681
22      16050688        2       5008    C:0.9998        T:0.000199681
22      16050732        2       5008    C:0.9998        T:0.000199681
22      16050739        2       5008    TA:0.992412     T:0.00758786
22      16050758        2       5008    T:0.9998        C:0.000199681
22      16050783        2       5008    A:0.992212      G:0.00778754
22      16050840        2       5008    C:0.994808      G:0.00519169
22      16050847        2       5008    T:0.998802      C:0.00119808
22      16050856        2       5008    G:0.9998        T:0.000199681
22      16050874        2       5008    G:0.9998        T:0.000199681

 

ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Giovanni M Dall'Olio27k
0
gravatar for niubbo
5.3 years ago by
niubbo10
Italy
niubbo10 wrote:

thank you for answer, but I had just try --freq command but it returns a txt file without maf

this is 2 first row of file

CHROM    POS    N_ALLELES    N_CHR    {ALLELE:FREQ}
chrM      72      2             2       T:0    C:1

and it is the same for all other SNP

CHROM    POS    N_ALLELES    N_CHR    {ALLELE:FREQ}
chrM    72    2    2    T:0    C:1
chrM    73    2    2    G:0    A:1
chrM    93    2    2    A:0    G:1
chrM    150    2    2    T:0    C:1
chrM    195    2    2    C:0    T:1
chrM    410    2    2    A:0    T:1
chrM    2354    2    2    C:0    T:1
chrM    2485    2    2    C:0    T:1
chrM    5581    2    2    C:0    T:1
chrM    6493    2    2    C:0    A:1
chrM    7445    2    2    G:0    A:1
ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by niubbo10

I'm sorry, I try to eliminate my double post and I have deleted yours too.

You suggest me my SNP is monoallelic ...but how to know global frequencies ( i mean the frequency of an allele in the world, referring to 1000 genome project? )

ADD REPLYlink written 5.3 years ago by niubbo10
1

If you want to add the global frequencies of the 1K genomes you can use the Variant Effect Predictor tool with the command '--gmaf' described as: Add the global minor allele frequency (MAF) from 1000 Genomes Phase 1 data for any existing variant to the output. Not used by default

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by Cristian Pérez10

thank you for answer, but I'm a beginner and probably I can't understand so well what you exactly would say, this is what I've done:

After installed veptool  I used this command line : ( I divided it in four parts to make it easier to understand....)

gianluca@gianluca-kde:~/tempo/3soggetti$ 
sudo perl /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl

--input_file NGS-3181_0322_R1.recalibrated.filtered.vcf -i 

--gmaf 

--output_file gmaf3181.vcf

and this is the error it give me:

Can't locate Bio/EnsEMBL/Variation/Utils/Sequence.pm in @INC (you may need to install the Bio::EnsEMBL::Variation::Utils::Sequence module) (@INC contains: /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl .) at /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl line 49.
BEGIN failed--compilation aborted at /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl line 49.

 

What I'm doing  wrong?

 

ADD REPLYlink written 5.3 years ago by niubbo10

You need to run the INSTALL.pl script in the VEP directory;

http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html#installer

(or run on the web interface, see my main answer)

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by EnsemblWill560
0
gravatar for alexisdereeper
4.2 years ago by
alexisdereeper30 wrote:

The SNiPlay web application can report a tabular file including MAF information from a VCF file (using VCFtools): http://sniplay.southgreen.fr/cgi-bin/analysis_v3.cgi

ADD COMMENTlink written 4.2 years ago by alexisdereeper30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1138 users visited in the last hour