Question: about merging VCF files
0
gravatar for Bogdan
3.1 years ago by
Bogdan740
Palo Alto, CA, USA
Bogdan740 wrote:

Dear all,

we do have a large number of VCF files , I am attempting to merge all of them by using VCF tools, in the following way :

for f in *.vcf do bgzip -c "$f" > "$f.gz" tabix -p vcf "$f.gz" done

and :

vcf-merge *vcf.gz.

However, during the vcf-merge step, I am getting an error :

Use of uninitialized value in hash element at /usr/local/share/perl/5.18.2/Vcf.pm line 1720, <__ANONIO__> line 1158. Use of uninitialized value in hash element at /usr/local/share/perl/5.18.2/Vcf.pm line 1720, <__ANONIO__> line 1158.

any advice on why we do get this error ? many thanks !

-- bogdan

snp vcf • 2.1k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Bogdan740
1

you could try vcflib's merger.

ADD REPLYlink written 3.1 years ago by Zev.Kronenberg11k

Thank you Ram for your suggestion, a simple question though : as echo prints the command; how do I execute the java command that echo will print :

"echo -n "java -jar GenomeAnalysisTK.jar -T CombineVariants -R reference.fasta -o combined_output.vcf -genotypemergeOptions UNIQUIFY " && for vcf_file in $(ls folderWith100VcfFiles/*.vcf); do echo -n "--variant ${vcf_file}"; done"

ADD REPLYlink written 3.1 years ago by Bogdan740

Please do not add your comment as an answer. Move this to a reply to this comment: C: about merging VCF files

To do that, copy the contents of your comment above, click on "Add Reply" on my comment and paste what you copied. Then, hit Add comment. Once you do that, I will answer your question.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by RamRS21k
2
gravatar for Zev.Kronenberg
3.1 years ago by
United States
Zev.Kronenberg11k wrote:

GATK also has one that i've used recently for thousands of vcfs.

ADD COMMENTlink written 3.1 years ago by Zev.Kronenberg11k

Thanks Zev. A question though, about merging with GATK tools : is there any way to specify tens of vcf files without having to input them one by one with the "-L" option ? thanks ;)

ADD REPLYlink written 3.1 years ago by Bogdan740

CombineVariants does not need the -L option for that, you just need a --variant before each VCF file name. The question is - does CombineVariants work with .vcf.gz? IIRC, it should.

ADD REPLYlink written 3.1 years ago by RamRS21k

Thanks Ram. I am looking for a way to combine > 100 VCF files (from a folder) in a more automatic way, from a script, without having to write the name of each file after --variant. Is there any way to do that ? many thanks ;) !

ADD REPLYlink written 3.1 years ago by Bogdan740
2

use a list:

find path -type f -name "*.vcf" > input.list

and then use GATK with --variant input.list

ADD REPLYlink written 3.1 years ago by Pierre Lindenbaum119k

There will be someone that recommends make here, but I'm not good at that, so I'd say generate the GATK command with a bunch of echos. Something like:

 # Newlines added for readability - remove newlines before you run the command
$echo -n "java -jar GenomeAnalysisTK.jar -T CombineVariants -R reference.fasta
-o combined_output.vcf -genotypemergeOptions UNIQUIFY "
&& for vcf_file in $(ls folderWith100VcfFiles/*.vcf); do echo -n "
--variant ${vcf_file}"; done;

This will echo the static part first and echo, for each vcf file in the directory folderWith100VcfFiles, a properly formatted --variant argument.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by RamRS21k

that looks great ! thanks a lot , will use it as soon as I arrive in the lab ! BTW, could you recommend any good book on shell scripting/programming ? thanks ;) !

ADD REPLYlink written 3.1 years ago by Bogdan740

Sorry, I learnt shell programming through trial and error (and a LOT of Google) - I'm not aware of any book, although I bet there are a whole lot of useful books.

Tackle man pages one at a time and you will get there :)

ADD REPLYlink written 3.1 years ago by RamRS21k

thanks a lot Ram ! BTW, thought that I shall ask: are you guys doing a lot of somatic mutation variant calling ? if you do, what algorithms/software do you use ?

ADD REPLYlink written 3.1 years ago by Bogdan740

Please open a new question.

ADD REPLYlink written 3.1 years ago by RamRS21k

Thanks Ram for keeping the conversation organized. To reiterate my previous question about how do I execute the java command that echo will only print :)

echo -n "java -jar GenomeAnalysisTK.jar -T CombineVariants -R reference.fasta -o combined_output.vcf -genotypemergeOptions UNIQUIFY " && for vcf_file in $(ls folderWith100VcfFiles/*.vcf); do echo -n "--variant ${vcf_file}"; done"

ADD REPLYlink written 3.1 years ago by Bogdan740
1

You can either copy-paste the echo'd command to the command prompt and then hit return to run it, or redirect the output of the echos to a shell script file, chmod it and run the file. Better, use Pierre's solution - a list file is so much easier to handle.

ADD REPLYlink written 3.1 years ago by RamRS21k

thank you Ram and Pierre !

ADD REPLYlink written 3.1 years ago by Bogdan740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1382 users visited in the last hour