merging two VCF files
2
0
Entering edit mode
2.9 years ago
Zahra ▴ 110

Hi all,

I have two VCF files (Indel and SNV), and when I wanted to merge them by the gatk tool, I received the error:

“A sequence dictionary must be available”

so I tried to use UpdateVCFSequenceDictionary and hg19 dictionary, then I faced a new error:

“Key SNP found in VariantContext field INFO at chr1:548426 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default”

and now I do not know how to solve this problem.

Would you mind helping me, please?

gatk dictionary vcf • 2.0k views
ADD COMMENT
2
0
Entering edit mode

Thanks a lot

ADD REPLY
0
Entering edit mode
2.9 years ago
sbstevenlee ▴ 480

Check out the fuc Python package I wrote:

For command line interface (CLI):

$ fuc vcf_merge 1.vcf 2.vcf 3.vcf > merged.vcf

For application programming interface (API):

Assume we have the following data:

>>> from fuc import pyvcf
>>> data1 = {
...     'CHROM': ['chr1', 'chr1'],
...     'POS': [100, 101],
...     'ID': ['.', '.'],
...     'REF': ['G', 'T'],
...     'ALT': ['A', 'C'],
...     'QUAL': ['.', '.'],
...     'FILTER': ['.', '.'],
...     'INFO': ['.', '.'],
...     'FORMAT': ['GT:DP', 'GT:DP'],
...     'Steven': ['0/0:32', '0/1:29'],
...     'Sara': ['0/1:24', '1/1:30'],
... }
>>> data2 = {
...     'CHROM': ['chr1', 'chr1', 'chr2'],
...     'POS': [100, 101, 200],
...     'ID': ['.', '.', '.'],
...     'REF': ['G', 'T', 'A'],
...     'ALT': ['A', 'C', 'T'],
...     'QUAL': ['.', '.', '.'],
...     'FILTER': ['.', '.', '.'],
...     'INFO': ['.', '.', '.'],
...     'FORMAT': ['GT:DP', 'GT:DP', 'GT:DP'],
...     'Dona': ['./.:.', '0/0:24', '0/0:26'],
...     'Michel': ['0/1:24', '0/1:31', '0/1:26'],
... }
>>> vf1 = pyvcf.VcfFrame.from_dict([], data1)
>>> vf2 = pyvcf.VcfFrame.from_dict([], data2)
>>> vf1.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT  Steven    Sara
0  chr1  100  .   G   A    .      .    .  GT:DP  0/0:32  0/1:24
1  chr1  101  .   T   C    .      .    .  GT:DP  0/1:29  1/1:30
>>> vf2.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT    Dona  Michel
0  chr1  100  .   G   A    .      .    .  GT:DP   ./.:.  0/1:24
1  chr1  101  .   T   C    .      .    .  GT:DP  0/0:24  0/1:31
2  chr2  200  .   A   T    .      .    .  GT:DP  0/0:26  0/1:26

We can merge the two VcfFrames with how='inner' (default):

>>> pyvcf.merge([vf1, vf2]).df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT Steven Sara Dona Michel
0  chr1  100  .   G   A    .      .    .     GT    0/0  0/1  ./.    0/1
1  chr1  101  .   T   C    .      .    .     GT    0/1  1/1  0/0    0/1

We can also merge with how='outer':

>>> pyvcf.merge([vf1, vf2], how='outer').df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT Steven Sara Dona Michel
0  chr1  100  .   G   A    .      .    .     GT    0/0  0/1  ./.    0/1
1  chr1  101  .   T   C    .      .    .     GT    0/1  1/1  0/0    0/1
2  chr2  200  .   A   T    .      .    .     GT    ./.  ./.  0/0    0/1

Since both VcfFrames have the DP subfield, we can use format='GT:DP':

>>> pyvcf.merge([vf1, vf2], how='outer', format='GT:DP').df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT  Steven    Sara    Dona  Michel
0  chr1  100  .   G   A    .      .    .  GT:DP  0/0:32  0/1:24   ./.:.  0/1:24
1  chr1  101  .   T   C    .      .    .  GT:DP  0/1:29  1/1:30  0/0:24  0/1:31
2  chr2  200  .   A   T    .      .    .  GT:DP   ./.:.   ./.:.  0/0:26  0/1:26
ADD COMMENT

Login before adding your answer.

Traffic: 1654 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6