Hello everybody,
I need to obtain, from a vcf file, a tabular file (to view with excel or libreoffice calc) containing SNP\variant positions, and minor frequency allele.
Is it possible?
Hello everybody,
I need to obtain, from a vcf file, a tabular file (to view with excel or libreoffice calc) containing SNP\variant positions, and minor frequency allele.
Is it possible?
Use the VEP's web interface if you're not tied to the command line:
http://www.ensembl.org/Homo_sapiens/Tools/VEP
or http://grch37.ensembl.org/Homo_sapiens/Tools/VEP if your data is on GRCh37.
See http://www.ensembl.org/info/docs/tools/vep/online/input.html#ident for more details.
I installed it but ... you're right when you say you are not tied to the command line ...
The Web version is very useful although I think my PC run much faster
thank you
EDIT: I'm sorry there is another problem....web version doesn't accept file larger than 20Mb mine is 96Mb..
so I think I have to install vep on computer. I moved to directory using "cd" command and than this is the command line I used:
perl INSTALL.pl
I saw installation going ahead and there is folder...i can also find path bio/ensEMBL/Variation/utils and file "sequence.pm" that it cant find ( see message error i posted)
You can compress your VCF file using Gzip or zip to reduce the size of you upload.
You may also create a user account in Ensembl (easy, takes only a minute and your email address) to increase your upload limit to 50MB.
If the INSTALL.pl script succeeded then you should be able to run:
perl variant_effect_predictor.pl --help
and see the help message. If you asked the installer to install to a different directory, you should either add this to your PERL5LIB or include it with:
perl -I /my/custom/directory/ variant_effect_predictor.pl --help
I try to load tar.gz file but it show me this error:
The input format is invalid: the format is not recognized or there is a formatting issue in the input
The interface does not accept tar.gz files, only .gz files:
> ls *.vcf my_data.vcf > gzip my_data.vcf > ls *.gz my_data.vcf.gz
You should be able to upload this .gz file to the web interface.
Hello, I need just one more help with VEP......
is it possible from a vcf file obtain the single gene rather than a list of transcripts?
Yes:
a) using the script, see http://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick
b) using the web interface, choose one of the options from the dropdown labelled "Restrict results" under the "Filtering options" section (they correspond to most of the flags docs in a)), see http://www.ensembl.org/info/docs/tools/vep/online/input.html#filter
Perhaps, I didn't explain myself properly:
If I have understand, what you suggest permits to find a specific gene from a genes list. Instead I need this list: I have loaded a VCF file and filtered it. As result it show me for every gene a list of transcripts, (so I have about 120.000 transcripts) but I would know just the list of the genes presenting SNPs so I can use it in any other program (for study pathways for example)
You can use vcf2maf, though "MAF" here refers to Mutation Annotation Format. It's a tab-delimited format with 34 columns, but vcf2maf runs Ensembl's VEP to add a bunch of additional useful columns including the following contain MAFs (Minor Allele Freqs):
77. GMAF - minor allele and frequency in 1000 Genomes Phase 1 78. AFR_MAF - minor allele:frequency in 1000 Genomes Phase 1 African population 79. AMR_MAF - minor allele:frequency in 1000 Genomes Phase 1 American population 80. ASN_MAF - minor allele:frequency in 1000 Genomes Phase 1 Asian population 81. EUR_MAF - minor allele:frequency in 1000 Genomes Phase 1 European population 82. AA_MAF - minor allele:frequency in NHLBI-ESP African American population 83. EA_MAF - minor allele:frequency in NHLBI-ESP European American population
I try to use your program in past but it lacks some VEP dependency or something else....cant remember, however I'll try again tomorrow and I will let you know
EDIT:
ERROR: Cannot find VEP script variant_effect_predictor.pl in path: ~/vep
OK...i have variant_effect_predictor.pl installed with vep tool, is it possible copy and paste it in a folder to help vcf2maf to find it?
Just use the --freq option in vcftools:
vcftools --gzvcf ALL.chr22.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz --freq --out chr22
Output:
CHROM POS N_ALLELES N_CHR {ALLELE:FREQ} 22 16050075 2 5008 A:0.9998 G:0.000199681 22 16050115 2 5008 G:0.99361 A:0.00638978 22 16050213 2 5008 C:0.992412 T:0.00758786 22 16050319 2 5008 C:0.9998 T:0.000199681 22 16050527 2 5008 C:0.9998 A:0.000199681 22 16050568 2 5008 C:0.999601 A:0.000399361 22 16050607 2 5008 G:0.999002 A:0.000998403 22 16050627 2 5008 G:0.999601 T:0.000399361 22 16050646 2 5008 G:0.9998 T:0.000199681 22 16050654 5 5008 A:0.857228 <CN0>:0.00179712 <CN2>:0.0173722 <CN3>:0.119609 <CN4>:0.00399361 22 16050655 2 5008 G:0.9998 A:0.000199681 22 16050678 2 5008 C:0.999601 T:0.000399361 22 16050679 2 5008 G:0.9998 A:0.000199681 22 16050688 2 5008 C:0.9998 T:0.000199681 22 16050732 2 5008 C:0.9998 T:0.000199681 22 16050739 2 5008 TA:0.992412 T:0.00758786 22 16050758 2 5008 T:0.9998 C:0.000199681 22 16050783 2 5008 A:0.992212 G:0.00778754 22 16050840 2 5008 C:0.994808 G:0.00519169 22 16050847 2 5008 T:0.998802 C:0.00119808 22 16050856 2 5008 G:0.9998 T:0.000199681 22 16050874 2 5008 G:0.9998 T:0.000199681
thank you for answer, but I had just try --freq command but it returns a txt file without maf
this is 2 first row of file
CHROM POS N_ALLELES N_CHR {ALLELE:FREQ} chrM 72 2 2 T:0 C:1
and it is the same for all other SNP
CHROM POS N_ALLELES N_CHR {ALLELE:FREQ} chrM 72 2 2 T:0 C:1 chrM 73 2 2 G:0 A:1 chrM 93 2 2 A:0 G:1 chrM 150 2 2 T:0 C:1 chrM 195 2 2 C:0 T:1 chrM 410 2 2 A:0 T:1 chrM 2354 2 2 C:0 T:1 chrM 2485 2 2 C:0 T:1 chrM 5581 2 2 C:0 T:1 chrM 6493 2 2 C:0 A:1 chrM 7445 2 2 G:0 A:1
I'm sorry, I try to eliminate my double post and I have deleted yours too.
You suggest me my SNP is monoallelic ...but how to know global frequencies ( i mean the frequency of an allele in the world, referring to 1000 genome project? )
If you want to add the global frequencies of the 1K genomes you can use the Variant Effect Predictor tool with the command '--gmaf' described as: Add the global minor allele frequency (MAF) from 1000 Genomes Phase 1 data for any existing variant to the output. Not used by default
thank you for answer, but I'm a beginner and probably I can't understand so well what you exactly would say, this is what I've done:
After installed veptool I used this command line : ( I divided it in four parts to make it easier to understand....)
gianluca@gianluca-kde:~/tempo/3soggetti$ sudo perl /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl --input_file NGS-3181_0322_R1.recalibrated.filtered.vcf -i --gmaf --output_file gmaf3181.vcf
and this is the error it give me:
Can't locate Bio/EnsEMBL/Variation/Utils/Sequence.pm in @INC (you may need to install the Bio::EnsEMBL::Variation::Utils::Sequence module) (@INC contains: /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl .) at /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl line 49. BEGIN failed--compilation aborted at /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl line 49.
What I'm doing wrong?
You need to run the INSTALL.pl script in the VEP directory;
http://www.ensembl.org/info/docs/tools/vep/script/vep_download.html#installer
(or run on the web interface, see my main answer)
The SNiPlay web application can report a tabular file including MAF information from a VCF file (using VCFtools): http://sniplay.southgreen.fr/cgi-bin/analysis_v3.cgi