Question: How to merge two vcf files, which have same variants but don't regard same variants
0
gravatar for Apprentice
4.1 years ago by
Apprentice40
Apprentice40 wrote:

Hi.

Thank you for always help. I have an additional problem.

I would like to merge two vcf files (a.vcf, b.vcf) into one vcf file (c.vcf) using GATK CombineVariants. a.vcf and b.vcf have have same variants but don't regard same variants. Specifically, a.vcf and b.vcf are shown as below;

$ cat a.vcf

CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SampleA SampleB
chr1    897460  v5_202  A   <*:DEL> .   PASS    .   GT:AD:DP    0/0:20,0:20 0/1:14,14:28

$ cat b.vcf

CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SampleC SampleD
chr1    897459  v6_202  CA  C   2068.83 PASS    .   GT:AD:DP    0/0:43,0:43 0/1:40,6:46

As you can see, these files have a data of same variant, but coordinates are different. I want to merge the two files into one file and merge these two variant data into one variant data using GATK CombineVariants.

How should I merge the files?

snp sequence genome • 3.0k views
ADD COMMENTlink modified 4.1 years ago by Manuel Landesfeind1.2k • written 4.1 years ago by Apprentice40
2

How can a same variant have different coordinates in different samples. From the ID, it looks like they have been processed using different versions of "something". v5_202 v6_202 So you can't really merge them or they will be represented twice in your VCF file as separate variants.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by geek_y11k

Thank you for your comment.

Each vcf file was separately called using samples from different capture kit (V5, V6).

ADD REPLYlink written 4.1 years ago by Apprentice40

Even though It cant have different coordinates for same variants. Essentially you can't merged these two unless have same coordinates.

ADD REPLYlink written 4.1 years ago by always_learning1.1k

In that case have them as separate variants or manually correct one of the coordinates. But I do not know if that will have any downstream effects.

ADD REPLYlink written 4.1 years ago by geek_y11k
3
gravatar for Manuel Landesfeind
4.1 years ago by
Göttingen, Germany
Manuel Landesfeind1.2k wrote:

What you are looking for is called variant normalization or parsimony variant representation. But there is no need for a manual work ;-)

When merging variants, I employ bcftools norm first. In fact, I found the following pipeline to work best:

bcftools norm --multiallelics '-any' a.vcf | bcftools norm -f '/path/to/genome.fa' > a.normed.vcf

After I did it for both files, I merge them using bcftools merge.

If your workflow is GATK based, the appropriate tools chain might be VariantsToAllelicPrimitives, LeftAlignAndTrimVariants, and finally CombineVariants.

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Manuel Landesfeind1.2k

Thank you for your great comment.

I applied the command, which you wrote, to a.vcf and b.vcf, but both files were not changed. Why is it? It seems that REF and ALT allels in b.vcf allels were not left-trimmed. How can I solve the problem ?

ADD REPLYlink written 4.1 years ago by Apprentice40

First, please do not blindly apply commands someone posts somewhere! Instead, read the manuals and documentation of the commands, try to understand what they are doing and then use them appropriately! If you would have done this and would have a decent understanding of Linux command line, you would have realized that a.vcf is not supposed to change, but that.... (this is left as an exercise - please read the link above)

Yes, the problem is the left alignment and trimming of the variants. In fact, your a.vcf is somewhat wrong because ALT must contain some nucleic acid letters. The representation of the variants used in b.vcf is the correct one.

You can solve the problem by reading my answer, learning the usage of the mentioned tools, and applying the tools to your files - again this is left as an exercise ;-)

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Manuel Landesfeind1.2k

Thank you for your advices! I'll learn a format of vcf file.

ADD REPLYlink written 4.0 years ago by Apprentice40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1781 users visited in the last hour