Question: Criteria for merging multiple VCF files
gravatar for seta
11 months ago by
seta1.4k wrote:

Dear all,

I merged many single VCF files to produce a multi-sample VCF file (human) using bcftools (bcftools merge -m non). Before merging, I split multiallelic sites with bctools, too. However, there are many sites with no genotype (call), just there is ./.. Could you please let me know if shall I apply any filtering step on each single VCF file before merging step?


merge bcftool vcf • 330 views
ADD COMMENTlink modified 11 months ago by Brice Sarver3.5k • written 11 months ago by seta1.4k
gravatar for Brice Sarver
11 months ago by
Brice Sarver3.5k
United States
Brice Sarver3.5k wrote:

If you're combining non-filtered VCFs, it would make sense to expect a lot of false positive variant calls that are unique to a single sample, resulting in missing genotypes in other samples. You could avoid or reduce this in a number of ways:

  1. See the GATK's HaplotypeCaller documentation on how to generate and merge gVCFs; individuals that have reference calls will correctly have 0/0 when applicable.
  2. Filter the VCFs before so you only have confidently-called variation, then merge them.
  3. Additionally filter on fixed number or percentage of individuals with missing data in your multisample VCF.
  4. Replace missing genotypes with reference calls in your final VCF (has a potential to be problematic).
ADD COMMENTlink modified 11 months ago • written 11 months ago by Brice Sarver3.5k

Thanks for your comments. Actually, I have VCF files, not gvcf, that the minimum quality score (5th column of VCF file) is 30. I did a basic filtering on single VCF file to keep all variants (SNP and Indel) with the minimum DP of 10 and GQ of 20; then I'm going to merge all filtered single VCF files again followed by keeping just variants present in 80% of individuals within the merged (multi-sample) VCF file. Could you please share me your idea? is this filtering enough? Please kindly tell me if the additional filtering is needed.


ADD REPLYlink written 11 months ago by seta1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1493 users visited in the last hour