Hello,
I have NGS data of samples from one population, with different sequencing depth (5X-30X). When I do PC analysis, PC2 is associated with depth (r2=0.5). In addition, samples with depth<15X have higher heterozygosity than samples with depth >15X.
So, I try to use Beagle v4.1 to do genotype refinement and want to remove this batch effect caused by the depth.
Beagle 4.1 command:
java -Xss5m -Xmx50g -jar beagle.11Mar19.69c.jar gl=myvcf map=plink.${chr}.GRCh38.map.beagle out=${chr}.beagle nthreads=2
My strategies:
- run beagle 4.1 in the vcf produced by standard GATK pipeline.
- remove sites with genotype missing rate >10%, then run beagle 4.1
- only use the individuals with depth>15X, and then run beagle 4.1
- set genotype with GQ<20 as missing, and set the PL as "0,0,0", then run beagle 4.1
But, I failed with above four strategies, and cannot remove the batch effect, just only change the distance between samples with 5X and other samples from the same populations, but cant change the distance between samples with 15X-20X and that with 30X.
Anybody have experience to remove the batch effect caused by sequencing depth?
Thank you very much!