Deleted:Should assemblies be removed if core gene alignment shows redundancies?
Entering edit mode
14 months ago
c_u ▴ 520

I have a set of ~100 bacterial genomes. I annotated them with Prokka and performed pangenome analysis with Roary. Roary outputs the core gene alignment file which I then used to generate a phylogenetic tree using RaxML. While running RaxML, the console output said -

IMPORTANT WARNING - Found 13 sequences that are exactly identical to other sequences in the alignment. Normally they should be excluded from the analysis.

My question is, should I remove these 13 sequences from my subsequent pangenomic/phylogenomic/other analyses based on this information? I first thought that this it would be obvious to remove these redundant/clonal sequences so that they don't mess up the statistics for gene enrichment etc. But a counter argument is that these 13 sequences are being called as exactly identical to other sequences in my database based on the core gene alignment. What about any differences these 13 assemblies may have (from the sequences these are supposedly identical to) in the non-core genome?

In other words, what if these sequences are actually completely unique but their uniqueness lies in terms of those genes that are not core genes, but those that are present in a subset of the assemblies?

phylogenetics genome alignment • 204 views
This thread is not open. No new answers may be added
Traffic: 1327 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6