Question: [Error] getting fasta file from 1000genomes
0
gravatar for Marie
5.4 years ago by
Marie0
Japan
Marie0 wrote:

Hi,

I'd like to get a fasta file of haplotypes from 1000 genomes.

I ran the commands below: 

http://www.1000genomes.org/faq/are-there-any-fasta-files-containing-1000-genomes-variants-or-haplotypes

tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr17.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 17:1471000-1472000 | perl vcf-subset -c HG00098 | bgzip -c > HG00098.vcf.gz
tabix -p vcf HG00098.vcf.gz
cat ref.fa | vcf-consensus HG00098.vcf.gz > HG00098.fa

but got error massages,

[tabix] the index file is older than the vcf file. Please use '-f' to overwrite or reindex.
Can't open perl script "vcf-subset": No such file or directory

And when I  tried to install vcf-tools, I got 
E: Unable to locate package vcf-tools

How can I run these commands successfully?

My machine is Ubuntu 64 bit on Win7.0.

 

 

software error gene genome • 2.1k views
ADD COMMENTlink written 5.4 years ago by Marie0

Can you give some more information about how you tried to install vcf-tools?

ADD REPLYlink written 5.4 years ago by Matt Shirley9.2k

Matt, 

Thank you.  I tried to run this:

marie@ubuntu:~/Downloads$ sudo apt-get install vcf-tools
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package vcf-tools

ADD REPLYlink written 5.4 years ago by Marie0
1

vcf tools is not in the distrib software repo. you have to install it 'manually'. See here.

ADD REPLYlink modified 5 weeks ago by RamRS24k • written 5.4 years ago by Phil S.660

Thank you Phil, maybe I move ahead a bit but got another error.

marie@ubuntu:~/Downloads/vcftools_0.1.8a/perl$ tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr17.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 17:1471000-1472000 | perl vcf-subset -c HG00098 | bgzip -c > HG00098.vcf.gz
[tabix] the index file is older than the vcf file. Please use '-f' to overwrite or reindex.
Broken VCF header, no column names?
 at /usr/share/perl5/Vcf.pm line 171
    Vcf::throw('Vcf4_1=HASH(0x160ee48)', 'Broken VCF header, no column names?') called at /usr/share/perl5/Vcf.pm line 845
    VcfReader::_read_column_names('Vcf4_1=HASH(0x160ee48)') called at /usr/share/perl5/Vcf.pm line 589
    VcfReader::parse_header('Vcf4_1=HASH(0x160ee48)') called at vcf-subset line 119
    main::vcf_subset('HASH(0x16051c8)') called at vcf-subset line 12
ADD REPLYlink modified 5 weeks ago by RamRS24k • written 5.4 years ago by Marie0

I think you can safely use the -f flag that tabix is warning you about. Currently it looks like tabix is not returning any data - you can check this by simply omitting everything after and including your first pipe character.

ADD REPLYlink modified 5 weeks ago by RamRS24k • written 5.4 years ago by Matt Shirley9.2k

Thank you Matt, I added -f like this and the program ran:

tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr17.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 17:1471000-1472000 -f

(Initially I tried to run the code below and it didn't work)

tabix -h -f ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr17.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 17:1471000-1472000

but at the last line, I got many errors:

ADD REPLYlink modified 5 weeks ago by RamRS24k • written 5.3 years ago by Marie0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 822 users visited in the last hour