How to merge two haploid samples (vcf, or g.vcf) into a pseudo-diploid?
1
2
Entering edit mode
5.5 years ago
Pistachio ▴ 20

Hi, I'm currently dealing with some Bumblebee genomes and I have data containing 10 drones (haploids), which I want to pair them up into a fake diploid.

I have already tried using GATK combinegvcfs, bcftools merge methods. I could only get a typical merged vcf with more individuals.

The GT field in the vcf currently contains a single number.

GT:AD:DP:GQ:PL  1:0,11:11:99:433,0

I want to take two drone data and make them appear as a single diploid, so that the GT field becomes something like 1/1 or 0/1.

Would appreciate any guidance or pointers.

---------------------Update-----------------------

[Solved!] Just in case anyone else in the future stumbles onto a similar situation this is what I did.

Step 1) Merged bam files in pairs using Samtools, it looks like:

samtools (name merged_output_file).bam (input_file#1).bam (input_files#2).bam

Step 2) I had to make them into g.vcf, in order to do that I needed an index file for each bam.

samtools index -b (merged_sample).bam

** -b makes .bai files

**Above steps make multi-sample bam, if you check it'll have 2 RG tags. I needed to edit the bam so that it looked like a single sample bam file. This is because I needed to run -ERC GVCF in haplotype caller later (only works with single sample). Which leads to step 3.

Step 3) Replace the RG tags and add a single new one using Picard:

picard AddOrReplaceReadGroups I=(merged_sample)bam O=(outputfile).bam RGID= (new ID) RGLB= (new LB) RGPU= (new PU) RGSM= (new SM)

**not sure if I should've made the index after this step, will update if I hit a problem.

Step 4) make g.vcf using GATK HaplotypeCaller with the default ploidy of 2.

genome • 2.4k views
ADD COMMENT
0
Entering edit mode

I'll give a try at merging the BAMs and making new vcfs. Thank you!

ADD REPLY
1
Entering edit mode
5.5 years ago
Alice ▴ 320

if you have alignments, you can merge 2 BAMs and call genotypes as if like BAM is now diploid. Otherwise I do not think there is a straightforward way to do this. You basically need to recalculate the entire vcf.

ADD COMMENT
0
Entering edit mode

Solved, it worked! Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6