Obtain MAF from VCF
5
1
Entering edit mode
6.9 years ago
ginlucks ▴ 10

Hello everybody,

I need to obtain, from a vcf file, a tabular file (to view with excel or libreoffice calc) containing SNP\variant positions, and minor frequency allele.

Is it possible?

MAF vcf tabular excel • 6.2k views
2
Entering edit mode
6.9 years ago
EnsemblWill ▴ 560

Use the VEP's web interface if you're not tied to the command line:

http://www.ensembl.org/Homo_sapiens/Tools/VEP

or http://grch37.ensembl.org/Homo_sapiens/Tools/VEP if your data is on GRCh37.

See http://www.ensembl.org/info/docs/tools/vep/online/input.html#ident for more details.

0
Entering edit mode

I installed it but ... you're right when you say you are not tied to the command line ...

The Web version is very useful although I think my PC run much faster

thank you

EDIT: I'm sorry there is another problem....web version doesn't accept file larger than 20Mb mine is 96Mb..

so I think I have to install vep on computer. I moved to directory using "cd" command and than this is the command line I used:

perl INSTALL.pl

I saw installation going ahead and there is folder...i can also find path bio/ensEMBL/Variation/utils and file "sequence.pm" that it cant find ( see message error i posted)

0
Entering edit mode

You can compress your VCF file using Gzip or zip to reduce the size of you upload.

If the INSTALL.pl script succeeded then you should be able to run:

perl variant_effect_predictor.pl --help

and see the help message. If you asked the installer to install to a different directory, you should either add this to your PERL5LIB or include it with:

perl -I /my/custom/directory/ variant_effect_predictor.pl --help

0
Entering edit mode

I try to load tar.gz file but  it show me this error:

The input format is invalid: the format is not recognized or there is a formatting issue in the input

1
Entering edit mode

The interface does not accept tar.gz files, only .gz files:

> ls *.vcf
my_data.vcf
> gzip my_data.vcf
> ls *.gz
my_data.vcf.gz

You should be able to upload this .gz file to the web interface.

0
Entering edit mode

ok, at the moments it is working...yesterday I had problem with .gz file too.

I think now I have all what I need, I'll try to understand how to install vep tool while my analysis is running on web tool.

thanks for help!

0
Entering edit mode

Hello, I need just one more help with VEP......

is it possible from a vcf file obtain the single gene rather than a list of transcripts?

0
Entering edit mode

Yes:

a) using the script, see http://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick

b) using the web interface, choose one of the options from the dropdown labelled "Restrict results" under the "Filtering options" section (they correspond to most of the flags docs in a)), see http://www.ensembl.org/info/docs/tools/vep/online/input.html#filter

0
Entering edit mode

Perhaps, I didn't explain myself properly:

If I have understand, what  you suggest permits to find a specific gene from a genes list. Instead I need this list: I have loaded a VCF file and filtered it. As result it show me for every gene a list of transcripts, (so I have about 120.000 transcripts) but I would know just the list of the genes presenting SNPs so I can use it in any other program (for study pathways for example)

0
Entering edit mode

I think I did understand OK, those options will do as you request.

How about you give it a try and then if it doesn't work out, ask again.

0
Entering edit mode

It's what I did.....but I was trying on results page instead the page which permits to set-up a new work.

Thank you

1
Entering edit mode
6.9 years ago

You can use vcf2maf, though "MAF" here refers to Mutation Annotation Format. It's a tab-delimited format with 34 columns, but vcf2maf runs Ensembl's VEP to add a bunch of additional useful columns including the following contain MAFs (Minor Allele Freqs):

77. GMAF - minor allele and frequency in 1000 Genomes Phase 1
78. AFR_MAF - minor allele:frequency in 1000 Genomes Phase 1 African population
79. AMR_MAF - minor allele:frequency in 1000 Genomes Phase 1 American population
80. ASN_MAF - minor allele:frequency in 1000 Genomes Phase 1 Asian population
81. EUR_MAF - minor allele:frequency in 1000 Genomes Phase 1 European population
82. AA_MAF - minor allele:frequency in NHLBI-ESP African American population
83. EA_MAF - minor allele:frequency in NHLBI-ESP European American population
0
Entering edit mode

I try to use your program in past but it lacks some VEP dependency or something else....cant remember, however I'll try again tomorrow and I will let you know

EDIT:

ERROR: Cannot find VEP script variant_effect_predictor.pl in path: ~/vep

OK...i have variant_effect_predictor.pl installed with vep tool, is it possible copy and paste it in a folder to help vcf2maf to find it?

0
Entering edit mode

Read the docs. It had options to let you specify where vep is installed (--vep-path), and where the vep cache is dumped (--vep-data).

0
Entering edit mode
6.9 years ago

Just use the --freq option in vcftools:

vcftools --gzvcf ALL.chr22.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz  --freq --out chr22

Output:

CHROM   POS     N_ALLELES       N_CHR   {ALLELE:FREQ}
22      16050075        2       5008    A:0.9998        G:0.000199681
22      16050115        2       5008    G:0.99361       A:0.00638978
22      16050213        2       5008    C:0.992412      T:0.00758786
22      16050319        2       5008    C:0.9998        T:0.000199681
22      16050527        2       5008    C:0.9998        A:0.000199681
22      16050568        2       5008    C:0.999601      A:0.000399361
22      16050607        2       5008    G:0.999002      A:0.000998403
22      16050627        2       5008    G:0.999601      T:0.000399361
22      16050646        2       5008    G:0.9998        T:0.000199681
22      16050654        5       5008    A:0.857228      <CN0>:0.00179712        <CN2>:0.0173722 <CN3>:0.119609  <CN4>:0.00399361
22      16050655        2       5008    G:0.9998        A:0.000199681
22      16050678        2       5008    C:0.999601      T:0.000399361
22      16050679        2       5008    G:0.9998        A:0.000199681
22      16050688        2       5008    C:0.9998        T:0.000199681
22      16050732        2       5008    C:0.9998        T:0.000199681
22      16050739        2       5008    TA:0.992412     T:0.00758786
22      16050758        2       5008    T:0.9998        C:0.000199681
22      16050783        2       5008    A:0.992212      G:0.00778754
22      16050840        2       5008    C:0.994808      G:0.00519169
22      16050847        2       5008    T:0.998802      C:0.00119808
22      16050856        2       5008    G:0.9998        T:0.000199681
22      16050874        2       5008    G:0.9998        T:0.000199681

0
Entering edit mode
6.9 years ago
ginlucks ▴ 10

thank you for answer, but I had just try --freq command but it returns a txt file without maf

this is 2 first row of file

CHROM    POS    N_ALLELES    N_CHR    {ALLELE:FREQ}
chrM      72      2             2       T:0    C:1

and it is the same for all other SNP

CHROM    POS    N_ALLELES    N_CHR    {ALLELE:FREQ}
chrM    72    2    2    T:0    C:1
chrM    73    2    2    G:0    A:1
chrM    93    2    2    A:0    G:1
chrM    150    2    2    T:0    C:1
chrM    195    2    2    C:0    T:1
chrM    410    2    2    A:0    T:1
chrM    2354    2    2    C:0    T:1
chrM    2485    2    2    C:0    T:1
chrM    5581    2    2    C:0    T:1
chrM    6493    2    2    C:0    A:1
chrM    7445    2    2    G:0    A:1
0
Entering edit mode

I'm sorry, I try to eliminate my double post and I have deleted yours too.

You suggest me my SNP is monoallelic ...but how to know global frequencies ( i mean the frequency of an allele in the world, referring to 1000 genome project? )

1
Entering edit mode

If you want to add the global frequencies of the 1K genomes you can use the Variant Effect Predictor tool with the command '--gmaf' described as: Add the global minor allele frequency (MAF) from 1000 Genomes Phase 1 data for any existing variant to the output. Not used by default

0
Entering edit mode

thank you for answer, but I'm a beginner and probably I can't understand so well what you exactly would say, this is what I've done:

After installed veptool  I used this command line : ( I divided it in four parts to make it easier to understand....)

gianluca@gianluca-kde:~/tempo/3soggetti\$
sudo perl /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl

--input_file NGS-3181_0322_R1.recalibrated.filtered.vcf -i

--gmaf

--output_file gmaf3181.vcf



and this is the error it give me:

Can't locate Bio/EnsEMBL/Variation/Utils/Sequence.pm in @INC (you may need to install the Bio::EnsEMBL::Variation::Utils::Sequence module) (@INC contains: /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl .) at /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl line 49.
BEGIN failed--compilation aborted at /home/gianluca/Documenti/Laboratorio/Tools/ensembl-tools-release-78/scripts/variant_effect_predictor/variant_effect_predictor.pl line 49.

What I'm doing  wrong?

0
Entering edit mode

You need to run the INSTALL.pl script in the VEP directory;

(or run on the web interface, see my main answer)

0
Entering edit mode
5.9 years ago

The SNiPlay web application can report a tabular file including MAF information from a VCF file (using VCFtools): http://sniplay.southgreen.fr/cgi-bin/analysis_v3.cgi