Problem with BUSCO alignments: more than one set of sequences for the same locus
0
0
Entering edit mode
12 weeks ago
silviaas • 0

Hi everyone,

I am working on getting a phylogeny based on BUSCOs extracted from low coverage genome assemblies. The coverage of my genomes range from 2-10X. As expected, the recovery success of complete single copy BUSCOs is very variable and rather low in most of the cases (~5-68%).

I generated a list of complete single copy BUSCOs for each terminal based on the .tsv output files and extracted the corresponding sequences directly from the single_copy_busco_sequences output folder. When I checked the individual loci alignments, I found that ~30% of the alignments contain more than one set of different sequences. In some cases the alignment contains only a couple of "weird" sequences. In other cases the alignment consist in 2 or more different sets of sequences. I attach here a couple of alignments as an example. I am sure the sequences are wrong because they affect to a random set of not closely related taxa.

I wanted to ask if anyone has experienced this issue before, and what could be the reason. The only reason I can imagine is that since the coverage is low, when the proper gene is not present maybe I am getting as best hit a wrongly assigned sequence. But even in that case, I wouldn't expect getting so many missasigned sequences, and sequences so different for the same BUSCO.

Finally, I tried to find an automatic strategy to clean the alignments, i.e. remove "weird" sequences from problematic alignments, or directly getting rid of the problematic alignments. But nothing I tried worked, and the only solution I found is removing the bad alignments manually.

I would appreciate any insight or suggestion about my problem I how could I solve it.

Thank you in advance.

Alignment 1

Alignment 2

Alignment 3

problem phylogenomics BUSCO alignment • 176 views
ADD COMMENT

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6