bcftools merge
1
0
Entering edit mode
2.9 years ago
evafinegan • 0

Hello,

I am trying to merge multiple vcf files. I have seven vcf files with 50-60 samples each. I tried:

bcftools merge file_1.vcf.gz file_2.vcf.gz file_3.vcf.gz file_4.vcf.gz file_5.vcf.gz file_6.vcf.gz file_7.vcf.gz > merge.vcf

But the output file merge.vcf does not show the GT calls as that of individual vcf files. Is there anything wrong in what I am using or is there any better method to merge multiple vcf files? Thank you!

SNP • 2.0k views
ADD COMMENT
0
Entering edit mode

what's the output of

gunzip -c  file_1.vcf.gz file_2.vcf.gz file_3.vcf.gz file_4.vcf.gz file_5.vcf.gz file_6.vcf.gz file_7.vcf.gz | grep "#CHROM"  
ADD REPLY
0
Entering edit mode

Here is the output:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  YG932859    YG932858    YG932857    YG932851    YG932852    YG932850    YG932849    YG932848    YG932847    YG932844    YG932817    YG932834    YG932815    YG932813    YG932829    YG932840    YG932804    YG932810    YG932809    YG932816    YG932797    YG932808    YG932846    YG932805    YG932820    YG932814    YG932825    YG932800    YG932818    YG932837    YG932819    YG932823    YG932856    YG932801    YG932812    YG932798    YG932807    YG932841    YG932802    YG932811    YG932854    YG932803    YG932799    YG932831    YG932855    YG932806    YG932827    YG932821    YG932835    YG932824    YG932826    YG932853    YG932845    YG932828    YG932830    YG932832    YG932833    YG932836    YG932838    YG932839    YG932822    YG932842    YG932843
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  YG932909    YG932908    YG932907    YG932882    YG932890    YG932900    YG932881    YG932880    YG932869    YG932884    YG932879    YG932875    YG932878    YG932872    YG932899    YG932874    YG932873    YG932862    YG932887    YG932892    YG932861    YG932866    YG932897    YG932876    YG932894    YG932864    YG932863    YG932898    YG932903    YG932883    YG932888    YG932904    YG932860    YG932867    YG932886    YG932902    YG932868    YG932877    YG932871    YG932885    YG932896    YG932895    YG932865    YG932889    YG932893    YG932891    YG932870    YG932901    YG932905    YG932906
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  YG932949    YG932948    YG932945    YG932944    YG932941    YG932940    YG932939    YG932946    YG932938    YG932933    YG932920    YG932937    YG932919    YG932935    YG932918    YG932921    YG932925    YG932917    YG932914    YG932923    YG932912    YG932947    YG932943    YG932927    YG932934    YG932913    YG932932    YG932942    YG932936    YG932915    YG932911    YG932928    YG932922    YG932910    YG932924    YG932929    YG932926    YG932930    YG932916    YG932931
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  YG932999    YG932998    YG932997    YG932970    YG932982    YG932983    YG932967    YG932976    YG932965    YG932964    YG932963    YG932973    YG932979    YG932962    YG932988    YG932969    YG932996    YG932972    YG932956    YG932959    YG932981    YG932953    YG932971    YG932950    YG932968    YG932987    YG932992    YG932952    YG932951    YG932966    YG932957    YG932954    YG932985    YG932955    YG932960    YG932958    YG932989    YG932974    YG932975    YG932977    YG932978    YG932961    YG932984    YG932986    YG932993    YG932990    YG932991    YG932994    YG932980    YG932995
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  YG933069    YG933068    YG933067    YG933066    YG933065    YG933064    YG933062    YG933058    YG933057    YG933056    YG933053    YG933052    YG933051    YG933049    YG933048    YG933047    YG933016    YG933011    YG933022    YG933013    YG933055    YG933002    YG933019    YG933012    YG933007    YG933014    YG933001    YG933020    YG933034    YG933000    YG933015    YG933017    YG933045    YG933030    YG933054    YG933008    YG933028    YG933018    YG933004    YG933023    YG933060    YG933005    YG933031    YG933006    YG933037    YG933010    YG933063    YG933024    YG933025    YG933026    YG933044    YG933027    YG933021    YG933032    YG933035    YG933036    YG933009    YG933038    YG933039    YG933003    YG933040    YG933059    YG933041    YG933050    YG933042    YG933061    YG933033    YG933043    YG933029    YG933046
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  YG933139    YG933138    YG933134    YG933133    YG933130    YG933128    YG933129    YG933127    YG933126    YG933125    YG933122    YG933132    YG933119    YG933118    YG933117    YG933124    YG933093    YG933094    YG933088    YG933092    YG933090    YG933136    YG933123    YG933073    YG933121    YG933089    YG933087    YG933100    YG933135    YG933085    YG933096    YG933137    YG933099    YG933083    YG933131    YG933116    YG933091    YG933084    YG933075    YG933077    YG933101    YG933120    YG933102    YG933086    YG933072    YG933071    YG933070    YG933113    YG933074    YG933079    YG933076    YG933097    YG933106    YG933078    YG933082    YG933114    YG933080    YG933095    YG933103    YG933104    YG933105    YG933107    YG933081    YG933108    YG933098    YG933109    YG933110    YG933111    YG933112    YG933115
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  YG933201    YG933200    YG933196    YG933195    YG933194    YG933199    YG933192    YG933191    YG933190    YG933161    YG933160    YG933187    YG933182    YG933159    YG933198    YG933145    YG933183    YG933189    YG933158    YG933157    YG933188    YG933156    YG933142    YG933153    YG933175    YG933151    YG933179    YG933147    YG933164    YG933155    YG933168    YG933140    YG933144    YG933146    YG933162    YG933143    YG933149    YG933170    YG933152    YG933167    YG933148    YG933150    YG933178    YG933163    YG933165    YG933166    YG933169    YG933173    YG933171    YG933172    YG933181    YG933174    YG933197    YG933177    YG933176    YG933141    YG933186    YG933193    YG933154    YG933180    YG933184    YG933185
ADD REPLY
0
Entering edit mode

Dear @Pierre, I am still having this issue while merging multiple vcf files. Could you please share any suggestions to fix it? Thank you!

ADD REPLY
0
Entering edit mode
2.9 years ago
sbstevenlee ▴ 480

CLI solution

Check out the vcf_merge command I wrote:

$ fuc vcf_merge -h
usage: fuc vcf_merge [-h] [--how TEXT] [--format TEXT] [--sort] [--collapse]
                     vcf_files [vcf_files ...]

This command will merge multiple VCF files (both zipped and unzipped). It
essentially wraps the 'pyvcf.merge' method from the fuc API.

By default, only the GT subfield of the FORMAT field will be included in the
merged VCF. Use '--format' to include additional FORMAT subfields such as AD
and DP.

usage examples:
  $ fuc vcf_merge 1.vcf 2.vcf 3.vcf > merged.vcf

positional arguments:
  vcf_files      VCF files

optional arguments:
  -h, --help     show this help message and exit
  --how TEXT     type of merge as defined in `pandas.DataFrame.merge`
                 (default: 'inner')
  --format TEXT  FORMAT subfields to be retained (e.g. 'GT:AD:DP') (default:
                 'GT')
  --sort         use this flag to turn off sorting of records (default: True)
  --collapse     use this flag to collapse duplicate records (default: False)

API solution

If you are familiar with Python and are planning on performing additional analyses on the merged VCF (e.g. filtering), you can also utilize the pyvcf.merge method I wrote:

Assume we have the following data:

>>> from fuc import pyvcf
>>> data1 = {
...     'CHROM': ['chr1', 'chr1'],
...     'POS': [100, 101],
...     'ID': ['.', '.'],
...     'REF': ['G', 'T'],
...     'ALT': ['A', 'C'],
...     'QUAL': ['.', '.'],
...     'FILTER': ['.', '.'],
...     'INFO': ['.', '.'],
...     'FORMAT': ['GT:DP', 'GT:DP'],
...     'Steven': ['0/0:32', '0/1:29'],
...     'Sara': ['0/1:24', '1/1:30'],
... }
>>> data2 = {
...     'CHROM': ['chr1', 'chr1', 'chr2'],
...     'POS': [100, 101, 200],
...     'ID': ['.', '.', '.'],
...     'REF': ['G', 'T', 'A'],
...     'ALT': ['A', 'C', 'T'],
...     'QUAL': ['.', '.', '.'],
...     'FILTER': ['.', '.', '.'],
...     'INFO': ['.', '.', '.'],
...     'FORMAT': ['GT:DP', 'GT:DP', 'GT:DP'],
...     'Dona': ['./.:.', '0/0:24', '0/0:26'],
...     'Michel': ['0/1:24', '0/1:31', '0/1:26'],
... }
>>> vf1 = pyvcf.VcfFrame.from_dict([], data1)
>>> vf2 = pyvcf.VcfFrame.from_dict([], data2)
>>> vf1.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT  Steven    Sara
0  chr1  100  .   G   A    .      .    .  GT:DP  0/0:32  0/1:24
1  chr1  101  .   T   C    .      .    .  GT:DP  0/1:29  1/1:30
>>> vf2.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT    Dona  Michel
0  chr1  100  .   G   A    .      .    .  GT:DP   ./.:.  0/1:24
1  chr1  101  .   T   C    .      .    .  GT:DP  0/0:24  0/1:31
2  chr2  200  .   A   T    .      .    .  GT:DP  0/0:26  0/1:26

We can merge the two VcfFrames with how='inner' (default):

>>> pyvcf.merge([vf1, vf2]).df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT Steven Sara Dona Michel
0  chr1  100  .   G   A    .      .    .     GT    0/0  0/1  ./.    0/1
1  chr1  101  .   T   C    .      .    .     GT    0/1  1/1  0/0    0/1

We can also merge with how='outer':

>>> pyvcf.merge([vf1, vf2], how='outer').df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT Steven Sara Dona Michel
0  chr1  100  .   G   A    .      .    .     GT    0/0  0/1  ./.    0/1
1  chr1  101  .   T   C    .      .    .     GT    0/1  1/1  0/0    0/1
2  chr2  200  .   A   T    .      .    .     GT    ./.  ./.  0/0    0/1

Since both VcfFrames have the DP subfield, we can use format='GT:DP':

>>> pyvcf.merge([vf1, vf2], how='outer', format='GT:DP').df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT  Steven    Sara    Dona  Michel
0  chr1  100  .   G   A    .      .    .  GT:DP  0/0:32  0/1:24   ./.:.  0/1:24
1  chr1  101  .   T   C    .      .    .  GT:DP  0/1:29  1/1:30  0/0:24  0/1:31
2  chr2  200  .   A   T    .      .    .  GT:DP   ./.:.   ./.:.  0/0:26  0/1:26
ADD COMMENT

Login before adding your answer.

Traffic: 1762 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6