Question: How to merge vcf files with different variants but same samples?
0
gravatar for humeira.tayyab
9 months ago by
humeira.tayyab0 wrote:

I have vcf files with exactly same meta region as well as same column names for fix and gt region but different variants. I want to merge them into a single file vcf file with same meta and combined fixed and gt region.like this :

file1.vcf

 ##fileformat=VCFv4.1 
 ##FILTER=<ID=PASS,Description="Passed all filters">
 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth">
 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 #CHROM POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3
1   10  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   11  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   12  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   13  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1

file2.vcf

  ##fileformat=VCFv4.1
  ##FILTER=<ID=PASS,Description="Passed all filters">  
 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth">
 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 #CHROM POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3
1   14  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   15  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   16  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   17  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1

merged.vcf

 ##fileformat=VCFv4.1
 ##FILTER=<ID=PASS,Description="Passed all filters">
 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth">
 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 #CHROM POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3
1   10  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   11  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   12  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   13  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
1   14  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   15  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   16  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   17  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
vcf • 1.0k views
ADD COMMENTlink modified 9 months ago by cpad011211k • written 9 months ago by humeira.tayyab0

What have you tried? Have you checked vcftools/bcftools? Also, please use the formatting bar (especially the code option) to present your post better. I've done it for you this time. Formatting bar

ADD REPLYlink written 9 months ago by RamRS20k
2
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe37k
Republic of Ireland
Kevin Blighe37k wrote:

Just use bcftools concat. You should additionally get into the habit of normalising your VCF files prior to performing downstream analyses on them. This can be done with bcftools norm -m-any (I have not done that for the purposes of this answer):

bgzip file1.vcf
bgzip file2.vcf

tabix -p file1.vcf.gz
tabix -p file2.vcf.gz
bcftools concat file1.vcf.gz file2.vcf.gz 

##fileformat=VCFv4.1 
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=1>
##bcftools_concatVersion=1.2+htslib-1.2.1
##bcftools_concatCommand=concat file1.vcf.gz file2.vcf.gz
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3
1   10  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   11  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   12  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   13  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
1   14  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   15  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   16  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   17  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
ADD COMMENTlink written 9 months ago by Kevin Blighe37k
1
gravatar for Pierre Lindenbaum
9 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum117k wrote:

(changed)

use picard GatherVcfs : https://broadinstitute.github.io/picard/command-line-overview.html

ADD COMMENTlink modified 9 months ago • written 9 months ago by Pierre Lindenbaum117k
0
gravatar for cpad0112
9 months ago by
cpad011211k
India
cpad011211k wrote:

Since both th vcfs belong to same sample and contain identical headers:

$ cat test1.vcf <(awk '!/#/ {print}' test2.vcf)

##fileformat=VCFv4.1 
##FILTER=<ID=PASS,Description="Passed all filters">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3
1   10  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   11  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   12  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   13  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
1   14  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   15  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   16  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   17  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
ADD COMMENTlink modified 9 months ago • written 9 months ago by cpad011211k

cough sorting cough

ADD REPLYlink written 9 months ago by RamRS20k

Records are already coordinate sorted.

ADD REPLYlink modified 9 months ago • written 9 months ago by cpad011211k

That's not all the data, surely. Better safe than sorry.

ADD REPLYlink written 9 months ago by RamRS20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1297 users visited in the last hour