Hello everybody,
I need to obtain, from a vcf file, a tabular file (to view with excel or libreoffice calc) containing SNP\variant positions, and minor frequency allele.
Is it possible?
Hello everybody,
I need to obtain, from a vcf file, a tabular file (to view with excel or libreoffice calc) containing SNP\variant positions, and minor frequency allele.
Is it possible?
Use the VEP's web interface if you're not tied to the command line:
http://www.ensembl.org/Homo_sapiens/Tools/VEP
or http://grch37.ensembl.org/Homo_sapiens/Tools/VEP if your data is on GRCh37.
See http://www.ensembl.org/info/docs/tools/vep/online/input.html#ident for more details.
Just use the --freq
option in vcftools:
vcftools --gzvcf ALL.chr22.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz --freq --out chr22
Output:
CHROM POS N_ALLELES N_CHR {ALLELE:FREQ}
22 16050075 2 5008 A:0.9998 G:0.000199681
22 16050115 2 5008 G:0.99361 A:0.00638978
22 16050213 2 5008 C:0.992412 T:0.00758786
22 16050319 2 5008 C:0.9998 T:0.000199681
22 16050527 2 5008 C:0.9998 A:0.000199681
22 16050568 2 5008 C:0.999601 A:0.000399361
22 16050607 2 5008 G:0.999002 A:0.000998403
22 16050627 2 5008 G:0.999601 T:0.000399361
22 16050646 2 5008 G:0.9998 T:0.000199681
22 16050654 5 5008 A:0.857228 <CN0>:0.00179712 <CN2>:0.0173722 <CN3>:0.119609 <CN4>:0.00399361
22 16050655 2 5008 G:0.9998 A:0.000199681
22 16050678 2 5008 C:0.999601 T:0.000399361
22 16050679 2 5008 G:0.9998 A:0.000199681
22 16050688 2 5008 C:0.9998 T:0.000199681
22 16050732 2 5008 C:0.9998 T:0.000199681
22 16050739 2 5008 TA:0.992412 T:0.00758786
22 16050758 2 5008 T:0.9998 C:0.000199681
22 16050783 2 5008 A:0.992212 G:0.00778754
22 16050840 2 5008 C:0.994808 G:0.00519169
22 16050847 2 5008 T:0.998802 C:0.00119808
22 16050856 2 5008 G:0.9998 T:0.000199681
22 16050874 2 5008 G:0.9998 T:0.000199681
You can use vcf2maf, though "MAF" here refers to Mutation Annotation Format. It's a tab-delimited format with 34 columns, but vcf2maf runs Ensembl's VEP to add a bunch of additional useful columns including the following contain MAFs (Minor Allele Freqs):
77. GMAF - minor allele and frequency in 1000 Genomes Phase 1 78. AFR_MAF - minor allele:frequency in 1000 Genomes Phase 1 African population 79. AMR_MAF - minor allele:frequency in 1000 Genomes Phase 1 American population 80. ASN_MAF - minor allele:frequency in 1000 Genomes Phase 1 Asian population 81. EUR_MAF - minor allele:frequency in 1000 Genomes Phase 1 European population 82. AA_MAF - minor allele:frequency in NHLBI-ESP African American population 83. EA_MAF - minor allele:frequency in NHLBI-ESP European American population
I try to use your program in past but it lacks some VEP dependency or something else....cant remember, however I'll try again tomorrow and I will let you know
EDIT:
ERROR: Cannot find VEP script variant_effect_predictor.pl in path: ~/vep
OK. I have variant_effect_predictor.pl
installed with vep tool, is it possible copy and paste it in a folder to help vcf2maf to find it?
thank you for answer, but I had just try --freq command but it returns a txt file without maf
this is 2 first row of file
CHROM POS N_ALLELES N_CHR {ALLELE:FREQ} chrM 72 2 2 T:0 C:1
and it is the same for all other SNP
CHROM POS N_ALLELES N_CHR {ALLELE:FREQ} chrM 72 2 2 T:0 C:1 chrM 73 2 2 G:0 A:1 chrM 93 2 2 A:0 G:1 chrM 150 2 2 T:0 C:1 chrM 195 2 2 C:0 T:1 chrM 410 2 2 A:0 T:1 chrM 2354 2 2 C:0 T:1 chrM 2485 2 2 C:0 T:1 chrM 5581 2 2 C:0 T:1 chrM 6493 2 2 C:0 A:1 chrM 7445 2 2 G:0 A:1
If you want to add the global frequencies of the 1K genomes you can use the Variant Effect Predictor tool with the command '--gmaf' described as: Add the global minor allele frequency (MAF) from 1000 Genomes Phase 1 data for any existing variant to the output. Not used by default
Thank you for answer, but I'm a beginner and probably I can't understand so well what you exactly would say, this is what I've done:
After installed veptool I used this command line: (I divided it in four parts to make it easier to understand....)
gianluca@gianluca-kde:~/tempo/3soggetti$
sudo perl /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl
--input_file NGS-3181_0322_R1.recalibrated.filtered.vcf -i
--gmaf
--output_file gmaf3181.vcf
And this is the error it give me:
Can't locate Bio/EnsEMBL/Variation/Utils/Sequence.pm in @INC (you may need to install the Bio::EnsEMBL::Variation::Utils::Sequence module) (@INC contains: /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl .) at /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl line 49.
BEGIN failed--compilation aborted at /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl line 49.
What I'm doing wrong?
You need to run the INSTALL.pl script in the VEP directory;
http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html#installer
(or run on the web interface, see my main answer)
The SNiPlay web application can report a tabular file including MAF information from a VCF file (using VCFtools): http://sniplay.southgreen.fr/cgi-bin/analysis_v3.cgi
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I installed it but ... you're right when you say you are not tied to the command line ...
The Web version is very useful although I think my PC run much faster
Thank you
EDIT: I'm sorry there is another problem....web version doesn't accept file larger than 20Mb mine is 96Mb..
So I think I have to install vep on computer. I moved to directory using "cd" command and than this is the command line I used:
I saw installation going ahead and there is folder. I can also find path bio/ensEMBL/Variation/utils and file "sequence.pm" that it cant find (see message error I posted)
You can compress your VCF file using Gzip or zip to reduce the size of you upload.
You may also create a user account in Ensembl (easy, takes only a minute and your email address) to increase your upload limit to 50MB.
If the INSTALL.pl script succeeded then you should be able to run:
and see the help message. If you asked the installer to install to a different directory, you should either add this to your PERL5LIB or include it with:
I try to load tar.gz file but it show me this error:
The interface does not accept tar.gz files, only .gz files:
You should be able to upload this .gz file to the web interface.
OK, at the moments it is working...yesterday I had problem with .gz file too.
I think now I have all what I need, I'll try to understand how to install vep tool while my analysis is running on web tool.
Thanks for help!
Hello, I need just one more help with VEP......
Is it possible from a vcf file obtain the single gene rather than a list of transcripts?
Yes:
a) using the script, see http://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick
b) using the web interface, choose one of the options from the dropdown labelled "Restrict results" under the "Filtering options" section (they correspond to most of the flags docs in a)), see http://www.ensembl.org/info/docs/tools/vep/online/input.html#filter
Perhaps, I didn't explain myself properly:
If I have understand, what you suggest permits to find a specific gene from a genes list. Instead I need this list: I have loaded a VCF file and filtered it. As result it show me for every gene a list of transcripts, (so I have about 120.000 transcripts) but I would know just the list of the genes presenting SNPs so I can use it in any other program (for study pathways for example)
I think I did understand OK, those options will do as you request.
How about you give it a try and then if it doesn't work out, ask again.
It's what I did.....but I was trying on results page instead the page which permits to set-up a new work.
Thank you