Question: How to merge two haploid samples (vcf, or g.vcf) into a pseudo-diploid?
18 months ago by
Hi, I'm currently dealing with some Bumblebee genomes and I have data containing 10 drones (haploids), which I want to pair them up into a fake diploid.

I have already tried using GATK combinegvcfs, bcftools merge methods. I could only get a typical merged vcf with more individuals.

The GT field in the vcf currently contains a single number.

GT:AD:DP:GQ:PL  1:0,11:11:99:433,0

I want to take two drone data and make them appear as a single diploid, so that the GT field becomes something like 1/1 or 0/1.

Would appreciate any guidance or pointers.


[Solved!] Just in case anyone else in the future stumbles onto a similar situation this is what I did.

Step 1) Merged bam files in pairs using Samtools, it looks like:

samtools (name merged_output_file).bam (input_file#1).bam (input_files#2).bam

Step 2) I had to make them into g.vcf, in order to do that I needed an index file for each bam.

samtools index -b (merged_sample).bam

** -b makes .bai files

**Above steps make multi-sample bam, if you check it'll have 2 RG tags. I needed to edit the bam so that it looked like a single sample bam file. This is because I needed to run -ERC GVCF in haplotype caller later (only works with single sample). Which leads to step 3.

Step 3) Replace the RG tags and add a single new one using Picard:

picard AddOrReplaceReadGroups I=(merged_sample)bam O=(outputfile).bam RGID= (new ID) RGLB= (new LB) RGPU= (new PU) RGSM= (new SM)

**not sure if I should've made the index after this step, will update if I hit a problem.

Step 4) make g.vcf using GATK HaplotypeCaller with the default ploidy of 2.

I'll give a try at merging the BAMs and making new vcfs. Thank you!

18 months ago by
if you have alignments, you can merge 2 BAMs and call genotypes as if like BAM is now diploid. Otherwise I do not think there is a straightforward way to do this. You basically need to recalculate the entire vcf.

Solved, it worked! Thank you!

