Hi, I'm currently dealing with some Bumblebee genomes and I have data containing 10 drones (haploids), which I want to pair them up into a fake diploid.
I have already tried using GATK combinegvcfs, bcftools merge methods. I could only get a typical merged vcf with more individuals.
The GT field in the vcf currently contains a single number.
I want to take two drone data and make them appear as a single diploid, so that the GT field becomes something like 1/1 or 0/1.
Would appreciate any guidance or pointers.
[Solved!] Just in case anyone else in the future stumbles onto a similar situation this is what I did.
Step 1) Merged bam files in pairs using Samtools, it looks like:
samtools (name merged_output_file).bam (input_file#1).bam (input_files#2).bam
Step 2) I had to make them into g.vcf, in order to do that I needed an index file for each bam.
samtools index -b (merged_sample).bam
** -b makes .bai files
**Above steps make multi-sample bam, if you check it'll have 2 RG tags. I needed to edit the bam so that it looked like a single sample bam file. This is because I needed to run -ERC GVCF in haplotype caller later (only works with single sample). Which leads to step 3.
Step 3) Replace the RG tags and add a single new one using Picard:
picard AddOrReplaceReadGroups I=(merged_sample)bam O=(outputfile).bam RGID= (new ID) RGLB= (new LB) RGPU= (new PU) RGSM= (new SM)
**not sure if I should've made the index after this step, will update if I hit a problem.
Step 4) make g.vcf using GATK HaplotypeCaller with the default ploidy of 2.