bcftools merge is resulting in a lot of missing data, how do I fix this?
Entering edit mode
21 months ago
devenvyas ▴ 720

I had bcf files from imputations using Glimpse (https://odelaneau.github.io/GLIMPSE/). I converted the individual imputation files (22 per individual) into bgzip vcf files. The vcf.gz files have complete data as is expected for imputation.

I am trying to merge them so all individuals are in the same files (so going from n × 22 files down to 22 files). When I do this, a lot of data just go missing, and there is no longer complete data.

I am not sure what is going on. Each individual was imputed for the exact same sites, so I am very confused. Does anyone know how to fix this problem?

vcf bcf • 1.1k views
Entering edit mode
21 months ago

I reckon you have different variant sites in your files. Individual A has SNPs at position 1, 2, 3, after imputation you'll still have SNPs at position 1, 2, 3. Individual B has SNPs at position 4, 5, 6, after imputation it's still 4, 5, 6. Once you merge them into one file, Individual A will have three missing alleles at position 4, 5, 6, individual B will have three missing SNPs at position 1, 2, 3. Compare the positions in your merged output files with your input files to see whether that's the case. If that's what's happening with your data there are two ways to fix this:

1) rerun the SNP-calling including all invariable and variable sites. In GATK that's -all-sites or -allSites, in bcftools call that means removing the -v flag (most tutorials have lines like bcftools call -mv -Ob -o calls.bcf, where -v means 'only report variable sites')

2) if you're sure that these sites aren't missing (may be impossible? they could be 0/0 - reference, they could be ./. - proper missing, maybe deleted, maybe low coverage) you can rerun bcftools merge using the -0 flag. In this case missing alleles are set to reference (0/0)

Edit: oh sorry, just saw the 'imputed for the exact same sites', are you sure the input files have all the same positions?


Login before adding your answer.

Traffic: 1106 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6