Merging Vcf Chunks After Gatk Unifiedgenotyper?
2
0
Entering edit mode
10.6 years ago

What is the recommended way of merging the resulting vcf files when running GATK UnifiedGenotyper version 1.6 on chunks of, say, 10M?

I've got files like this:

chr10.0001.vcf
chr10.0002.vcf
chr10.0003.vcf
chr10.0004.vcf
chr10.0005.vcf
chr10.0006.vcf
chr10.0007.vcf
chr10.0008.vcf
chr10.0009.vcf
chr10.0010.vcf
chr10.0011.vcf
chr10.0012.vcf
chr10.0013.vcf
chr10.0014.vcf

where the first file is the first 10M, the second is the following 10M, etc. and I want to end up with a since chr10.vcf file that includes all the ones above.

Is just doing find -name "chr10.*" | sort | xargs cat on the files enough?

gatk vcf vcftools • 4.1k views
ADD COMMENT
2
Entering edit mode
ADD COMMENT
2
Entering edit mode
10.6 years ago

if you are working with GATK you may find the CombineVariants walker very useful. the first example mentioned in the documentation shows how you can merge any set of .vcf files by adding them through the --variant option:

java -Xmx2g -jar GenomeAnalysisTK.jar \
-R ref.fasta \
-T CombineVariants \
--variant input1.vcf \
--variant input2.vcf \
-o output.vcf \
-genotypeMergeOptions UNIQUIFY
ADD COMMENT
0
Entering edit mode

GATK also offers an option to do a smart concatenation of variants that is faster than CombineVariants but safer than regular cat. See http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_tools_CatVariants.html

ADD REPLY

Login before adding your answer.

Traffic: 1968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6