Question: Help On Merging Vcf Files By Using Vcftools
1
gravatar for Jianfengmao
8.7 years ago by
Jianfengmao310
Jianfengmao310 wrote:

Dear BioStarers,

I am learning VCFtools by executing VCFtools commands on VCF files in Examples folder of the VCFtools installation path.

Please help me to fix the three problems followed and give me some tips or directions to merging VCF files.

Thanks in Advance.

(1). When I want to merge the three example VCF files, I failed.

commands:
merge-vcf merge-test-a.vcf merge-test-b.vcf merge-test-c.vcf > merg.vcf

results:
[main] fail to load the index file.
The command "tabix -l merge-test-a.vcf" exited with an error. Is the
file tabix indexed?

 at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 167
       Vcf::throw('Vcf4_0=HASH(0x10082df18)', 'The command "tabix -l
merge-test-a.vcf" exited with an error....') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 1687
       VcfReader::get_chromosomes('Vcf4_0=HASH(0x10082df18)') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 139
       main::init_cols('HASH(0x10082a3d0)', 'Vcf4_0=HASH(0x10082e110)')
called at /Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf
line 219
       main::merge_vcf_files('HASH(0x10082a3d0)') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 12

(2). Then I tried to compressed them. After I compressed and indexed the VCF files, I still failed to merge them.

bgzip merge-test-a.vcf
bgzip merge-test-b.vcf
bgzip merge-test-c.vcf

tabix -p vcf merge-test-a.vcf.gz
tabix -p vcf merge-test-b.vcf.gz
tabix -p vcf merge-test-c.vcf.gz

###########################################################################
merge Command:
merge-vcf merge-test-a.vcf.gz merge-test-b.vcf.gz merge-test-c.vcf.gz
| bgzip -c > merg.vcf.gz

results:
zcat: merge-test-a.vcf.gz.Z: No such file or directory
Error reading VCF file.

 at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 167
       Vcf::throw('Vcf=HASH(0x1008f32a8)', 'Error reading VCF file.\x{a}')
called at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line
280
       Vcf::next_line('Vcf=HASH(0x1008f32a8)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 219
       Vcf::_open('Vcf=HASH(0x1008f32a8)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 161
       Vcf::new('Vcf', 'file', 'merge-test-a.vcf.gz') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 125
       main::init_cols('HASH(0x10082a3d0)', 'Vcf4_0=HASH(0x10082e110)')
called at /Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf
line 219
       main::merge_vcf_files('HASH(0x10082a3d0)') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 12
###########################################################################
merge Command:
merge-vcf merge-test-a.vcf.gz merge-test-b.vcf.gz merge-test-c.vcf.gz
> merg.vcf.gz

results:
zcat: merge-test-a.vcf.gz.Z: No such file or directory
Error reading VCF file.

 at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 167
       Vcf::throw('Vcf=HASH(0x1008f32a8)', 'Error reading VCF file.\x{a}')
called at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line
280
       Vcf::next_line('Vcf=HASH(0x1008f32a8)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 219
       Vcf::_open('Vcf=HASH(0x1008f32a8)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 161
       Vcf::new('Vcf', 'file', 'merge-test-a.vcf.gz') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 125
       main::init_cols('HASH(0x10082a3d0)', 'Vcf4_0=HASH(0x10082e110)')
called at /Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf
line 219
       main::merge_vcf_files('HASH(0x10082a3d0)') called at
/Users/jianfengmao/programe_files/VCFtools/bin/merge-vcf line 12

(3). vcf-stats and vcf-validator can work on all the three uncompressed VCF files: merge-test-a.vcf, merge-test-b.vcf, merge-test-c.vcf. But can not on the compressed files.

Command:

vcf-validator merge-test-a.vcf.gz

Results:

zcat: merge-test-c.vcf.gz.Z: No such file or directory
Error reading VCF file.

 at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 167
       Vcf::throw('Vcf=HASH(0x10082a0d0)', 'Error reading VCF file.\x{a}')
called at /Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line
280
       Vcf::next_line('Vcf=HASH(0x10082a0d0)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 219
       Vcf::_open('Vcf=HASH(0x10082a0d0)') called at
/Users/jianfengmao/programe_files/VCFtools/lib/Vcf.pm line 161
       Vcf::new('Vcf', 'file', 'merge-test-c.vcf.gz') called at
/Users/jianfengmao/programe_files/VCFtools/bin/vcf-validator line 53
       main::do_validation('HASH(0x100804ed0)') called at
/Users/jianfengmao/programe_files/VCFtools/bin/vcf-validator line 14
vcf merge vcftools • 22k views
ADD COMMENTlink modified 6.1 years ago by Biostar ♦♦ 20 • written 8.7 years ago by Jianfengmao310

Had a similar issue.  

Check that the .tbi and .gz files for your vcf-files are in the same directory.

The perl script for vcf-merge goes in and pulls the .tbi files. 

ADD REPLYlink written 5.2 years ago by aaron0
4
gravatar for Brad Chapman
8.7 years ago by
Brad Chapman9.4k
Boston, MA
Brad Chapman9.4k wrote:

For part 1, you want to bgzip and tabix index the files as you did in part 2; merge-vcf works on indexed VCF files:

% merge-vcf
About: Merge the bgzipped and tabix indexed VCF files.

Parts 2 and 3 work for me with the example files and the latest release (0.1.3.2):

% bgzip merge-test-a.vcf
% bgzip merge-test-b.vcf
% tabix -p vcf merge-test-a.vcf.gz
% tabix -p vcf merge-test-b.vcf.gz
% merge-vcf merge-test-a.vcf.gz merge-test-b.vcf.gz
Using column name 'A' for merge-test-a.vcf.gz:A
Using column name 'B' for merge-test-b.vcf.gz:B
##fileformat=VCFv4.0
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
[...]
% vcf-stats merge-test-a.vcf.gz
Rows with a call  .. 9
Genotypes total   .. 9
[...]

Which version are you using? Perhaps upgrading to the latest release will fix your problems.

ADD COMMENTlink modified 7 months ago by RamRS24k • written 8.7 years ago by Brad Chapman9.4k
1

We'll probably need additional information to help more. What does the output of the following commands look like: ls -lh merge-test*, zcat -V, zcat merge-test-a.vcf.gz. My guess is you might want to change line 204 of Vcf.pm and replace zcat with gunzip -c Please post these as an edit to your initial question, and format them as code (highlight them and press the button with the 1s and 0s on it). Thanks.

ADD REPLYlink modified 17 days ago by RamRS24k • written 8.7 years ago by Brad Chapman9.4k

My VCFtools is the updated version, VCFtools_0.1.3.2. And, I have checked for the update by using the commands listed in the website for VCFtools.

I am not so good at Unix and do not know much of Perl. I tried many to let my VCFtools workable. But, I failed all the times. I think the true reason is my bad configuration for VCFtools, for "Vcf.pm" or something else.

When I was merging the VCF files, I always got the same results relevant to Vcf.pm. So could you please give me further helps on verifying what has happen with my Vcf.pm.

ADD REPLYlink written 8.7 years ago by Jianfengmao310
4
gravatar for Jianfengmao
8.7 years ago by
Jianfengmao310
Jianfengmao310 wrote:

I got helps from Dr. Petr Danecek, the author of VCFtools, on this problem. The problem is only occurred for Max OS platform.

Dr. Petr Danecek said:

the problem you are observing is caused by a peculiar behaviour of zcat on Mac OS X which adds .Z to tfile names. This has been fixed in the latest revision (r403) by calling "gunzip -c" instead.

Many thanks to him.

ADD COMMENTlink written 8.7 years ago by Jianfengmao310
0
gravatar for Jianfengmao
8.7 years ago by
Jianfengmao310
Jianfengmao310 wrote:

It works, after I generated ".vcf.gz.Z" files by copying the original ".vcf.gz" files. But, till now I do not know why.

Could you please help me to explain it? I am using updated Mac OS.

% bgzip merge-test-a.vcf
% bgzip merge-test-b.vcf
% tabix -p vcf merge-test-a.vcf.gz
% tabix -p vcf merge-test-b.vcf.gz
$ cp merge-test-a.vcf.gz merge-test-a.vcf.gz.Z
$ cp merge-test-b.vcf.gz merge-test-b.vcf.gz.Z
% merge-vcf merge-test-a.vcf.gz merge-test-b.vcf.gz
ADD COMMENTlink modified 7 months ago by RamRS24k • written 8.7 years ago by Jianfengmao310

This is an addition to your initial question, and should be posted as an edit to that question instead of an answer. This helps keep things organized for future users.

ADD REPLYlink written 8.7 years ago by Brad Chapman9.4k
0
gravatar for user56
7.4 years ago by
user56290
United States
user56290 wrote:

I had a similar problem and I had to use windows :-(.

If you are working with small-ish VCF files you can use R to work with the data (e.g., split it)

To load the file use:

file='e:/d/genome/t300.txt'
v <- read.table(file,sep='\t',header = T,fileEncoding="utf-16")
str(v)

The UTF-16 encoding was particulary hard to troubleshoot. Eventually Notepad++ helped me to detect this encoding problem.

It correctly ignores the header lines and detects column headers as well.

To make VCF file smaller under windows, you can use PowerShell (alternative shell directly from Microsoft) you use these commands in the powershell: (e.g., first 3000 lines)

$a=(Get-Content C:\largeVCF.txt)[0 .. 3000]
$a>largeVCF-subset.txt
ADD COMMENTlink modified 7 months ago by RamRS24k • written 7.4 years ago by user56290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1171 users visited in the last hour