Merging/Concatenating Vcf Files
4
3
Entering edit mode
9.8 years ago
bioinfo ▴ 810

I have a vcf file of SNPs and another vcf file for my Indels. During SNP calling step by GATK, I created them separately instead of using -glm (calling together SNPs and Indels). Now to get the consensus sequence of my mapped genome, I want to put them together in the same vcf file. Should I merge these two vcf files or concatenate them to get the proper variant vcf file, so I can consider SNPs and Indels for my consensus sequence?

vcftools snp indel • 17k views
ADD COMMENT
13
Entering edit mode
9.8 years ago
dfornika ★ 1.0k

You should merge them using the vcf-merge utility that is part of the vcftools package.

Concatenation would be appropriate if you had separate files for each chromosome, and simply wanted to join them 'end-to-end' into a single file. In this case, your SNPs and indels need to be inter-woven ie. merged.

If you haven't used vcftools before, you can find it here:

http://vcftools.sourceforge.net/

Specifically, you can read about vcf-merge here:

http://vcftools.sourceforge.net/docs.html#merge

You will need to compress the vcf files with bgzip and index with tabix before you can run any vcftools functions on them.

ADD COMMENT
0
Entering edit mode

Thanks for ur reply. I was a bit confused with merging and concatenation. Now it's clear.

ADD REPLY
0
Entering edit mode

What should I do to merge VCF files column wise?

The format is as follows

CHROM POSID REF ALT QUAL FILTER INFO FORMAT CAST_EiJ

I want to add a column extra in this, so that my format will be

CHROM POSID REF ALT QUAL FILTER INFO FORMAT CAST_EiJ C57BL6J

in which all the columns except last will be common.

Thanks.

ADD REPLY
0
Entering edit mode

Please post this as a new question.

ADD REPLY
1
Entering edit mode
6 weeks ago
finster ▴ 60

You want to merge, you might also want to take a look at bcftools merge. bcftools is usually pretty quick. I am not sure if this is a requirement but I always make sure the files are sorted, bgzip compressed and indexed, ideally you do not want any overlapping sample IDs either.

Depending on how large your files are, you might want to take the first few rows from each one and use them to test the various options to make sure you are happy with the result, before going onto a time consuming big merge.

ADD COMMENT

Login before adding your answer.

Traffic: 1522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6