Im currently working with chimpanzee and Bonobo data, and i am curious to know if the datasets are contaminated with human DNA.
I know that that a potential issue is that generally the chimpanzee reference genome and human genome are quite similar. And that the assembly PanPan3 reference genome is lower quality compared to the human genome. So I am going to perform alignment to all the references genome to identify some of the differences.
I have a few initial ideas on how to identify human contamination, such as aligning to species-specific Alu elements, or examing the alignments to the mitochondria DNA, since it should be easier to identify the differences between the different species due to the shorter length and if a single sample contains multiple mT it would be from different individuals.
I know tools such as Kraken can be used to identify contamination, but now I'm just curious if anyone could help me with other ideas to identify potential human contamination in samples closely related to humans, such as the chimpanzee or Bonobos.