Merging a large number of VCF files
2
0
Entering edit mode
4.4 years ago
seta ★ 1.9k

Dear all,

I have a previously merged VCF file per chromosome (say, 22 VCF files) containing about 1000 samples; Also, there is a large number of single sample VCF files (with all chromosomes) that should be merged with the previously merged vcf file per chromosome. Could you please suggest to me the most appropriate way to merge the single sample vcf files to the previously merged vcf file based on the chromosome number? Also, these VCF files came from whole-genome sequencing and are very large in size; so please kindly advise me how I can speed the task and do it in the shortest time?

Thanks in advance

merging VCF whole genome • 2.2k views
ADD COMMENT
1
Entering edit mode
4.4 years ago

Hello,

I think bcftools merge could help you http://samtools.github.io/bcftools/bcftools.html#merge

By default bcftools will tag all missing positions in the one of the input VCFs as missing information (./.) in the GT field of each sample. If you want to change this to set missing positions to be equal to the REF, then add --missing-to-ref.

Good luck

ADD COMMENT
1
Entering edit mode
4.4 years ago

use a workflow manager (snakemake, nextflow):

for each chromosome C
  merge vcfs 1  to 100 into $C.1.vcf.gz
  merge vcfs 101  to 200 into $C.2.vcf.gz
  merge vcfs 201  to 300 into $C.3.vcf.gz
  ...
  merge vcfs 901  to 1000 into $C.10.vcf.gz

  ####     
  merge vcfs $C.1.vcf.gz to C.10.vcf.gz into $C.merged.vcf.gz
done

merge chr1.merged.vcf.gz to chrY.merged.vcf.gz into final.vcf.gz
ADD COMMENT

Login before adding your answer.

Traffic: 2581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6