Question: How to merge two haploid samples (vcf, or g.vcf) into a pseudo-diploid?
gravatar for Pistachio
7 months ago by
Pistachio0 wrote:

Hi, I'm currently dealing with some Bumblebee genomes and I have data containing 10 drones (haploids), which I want to pair them up into a fake diploid.

I have already tried using GATK combinegvcfs, bcftools merge methods. I could only get a typical merged vcf with more individuals.

The GT field in the vcf currently contains a single number.

GT:AD:DP:GQ:PL  1:0,11:11:99:433,0

I want to take two drone data and make them appear as a single diploid, so that the GT field becomes something like 1/1 or 0/1.

Would appreciate any guidance or pointers.


[Solved!] Just in case anyone else in the future stumbles onto a similar situation this is what I did.

Step 1) Merged bam files in pairs using Samtools, it looks like:

samtools (name merged_output_file).bam (input_file#1).bam (input_files#2).bam

Step 2) I had to make them into g.vcf, in order to do that I needed an index file for each bam.

samtools index -b (merged_sample).bam

** -b makes .bai files

**Above steps make multi-sample bam, if you check it'll have 2 RG tags. I needed to edit the bam so that it looked like a single sample bam file. This is because I needed to run -ERC GVCF in haplotype caller later (only works with single sample). Which leads to step 3.

Step 3) Replace the RG tags and add a single new one using Picard:

picard AddOrReplaceReadGroups I=(merged_sample)bam O=(outputfile).bam RGID= (new ID) RGLB= (new LB) RGPU= (new PU) RGSM= (new SM)

**not sure if I should've made the index after this step, will update if I hit a problem.

Step 4) make g.vcf using GATK HaplotypeCaller with the default ploidy of 2.

genome • 404 views
ADD COMMENTlink modified 7 months ago by genomax67k • written 7 months ago by Pistachio0

I'll give a try at merging the BAMs and making new vcfs. Thank you!

ADD REPLYlink modified 7 months ago • written 7 months ago by Pistachio0
gravatar for Alice
7 months ago by
Alice270 wrote:

if you have alignments, you can merge 2 BAMs and call genotypes as if like BAM is now diploid. Otherwise I do not think there is a straightforward way to do this. You basically need to recalculate the entire vcf.

ADD COMMENTlink written 7 months ago by Alice270

Solved, it worked! Thank you!

ADD REPLYlink written 7 months ago by Pistachio0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1130 users visited in the last hour