merge vcf files with overlapping positions and create consensus genotype
1
2
Entering edit mode
2.1 years ago

I have two separate vcf files (single individual each file, same individual ID for both files) that I would like to merge into a single vcf file. The vcf files contain SNPs that overlap in most positions. Using vcftools or bcftools concatenate functions, I can merge the two files into one and keep duplicate positions or exclude them (e.g. by only taking the first SNP), but I cannot create a consensus, for example if vcf file 1 is homozygous for one allele, and file 2 for the other allele at the same SNP, then I would like the merged vcf to be heterozygous for that SNP, containing both alleles. Is this possible using one of the available tools?

consensus snp merge vcf • 1.7k views
ADD COMMENT
1
Entering edit mode
2.1 years ago

ut I cannot create a consensus, for example if vcf file 1 is homozygous for one allele, and file 2 for the other allele at the same SNP, then I would like the merged vcf to be heterozygous for that SNP

run bcftools norm after bcftools concat

    -D, --remove-duplicates         Remove duplicate lines of the same type.
    -d, --rm-dup TYPE               Remove duplicate snps|indels|both|all|exact
(...)
ADD COMMENT
0
Entering edit mode

Thank you. I tried this, but it removes duplicates instead of merging them and using the allelic information of both files. From what I can see, it just takes the first instance of the SNP (e.g. genotype 0/0 at SNP derived from vcf file 1 is retained), and then removes the other instance (e.g. genotype 0/1 at same SNP derived from vcf file 2 disappears), rather than merging them (the consensus for the two genotypes would be 0/1). This means that with this command no consensus genotype is built.

ADD REPLY

Login before adding your answer.

Traffic: 3013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6