Question: How to merge two haploid samples (vcf, or g.vcf) into a pseudo-diploid?
gravatar for Pistachio
18 months ago by
Pistachio0 wrote:

Hi, I'm currently dealing with some Bumblebee genomes and I have data containing 10 drones (haploids), which I want to pair them up into a fake diploid.

I have already tried using GATK combinegvcfs, bcftools merge methods. I could only get a typical merged vcf with more individuals.

The GT field in the vcf currently contains a single number.

GT:AD:DP:GQ:PL  1:0,11:11:99:433,0

I want to take two drone data and make them appear as a single diploid, so that the GT field becomes something like 1/1 or 0/1.

Would appreciate any guidance or pointers.


[Solved!] Just in case anyone else in the future stumbles onto a similar situation this is what I did.

Step 1) Merged bam files in pairs using Samtools, it looks like:

samtools (name merged_output_file).bam (input_file#1).bam (input_files#2).bam

Step 2) I had to make them into g.vcf, in order to do that I needed an index file for each bam.

samtools index -b (merged_sample).bam

** -b makes .bai files

**Above steps make multi-sample bam, if you check it'll have 2 RG tags. I needed to edit the bam so that it looked like a single sample bam file. This is because I needed to run -ERC GVCF in haplotype caller later (only works with single sample). Which leads to step 3.

Step 3) Replace the RG tags and add a single new one using Picard:

picard AddOrReplaceReadGroups I=(merged_sample)bam O=(outputfile).bam RGID= (new ID) RGLB= (new LB) RGPU= (new PU) RGSM= (new SM)

**not sure if I should've made the index after this step, will update if I hit a problem.

Step 4) make g.vcf using GATK HaplotypeCaller with the default ploidy of 2.

genome • 748 views
ADD COMMENTlink modified 17 months ago by genomax80k • written 18 months ago by Pistachio0

I'll give a try at merging the BAMs and making new vcfs. Thank you!

ADD REPLYlink modified 18 months ago • written 18 months ago by Pistachio0
gravatar for Alice
18 months ago by
Alice290 wrote:

if you have alignments, you can merge 2 BAMs and call genotypes as if like BAM is now diploid. Otherwise I do not think there is a straightforward way to do this. You basically need to recalculate the entire vcf.

ADD COMMENTlink written 18 months ago by Alice290

Solved, it worked! Thank you!

ADD REPLYlink written 17 months ago by Pistachio0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 725 users visited in the last hour