Question: Virus core genome
gravatar for Salsabil
4 weeks ago by
Salsabil 0
Salsabil 0 wrote:

Hello everyone.

I'm working on a bunch of small viral genomes, and I need to construct their core genomes for a comparative study. I've tried using prokka and roary to do so, but since my sequences are too small ( 10 kb or less). I keep getting errors and warnings even though I obtain an output, and I keep getting 0 core genes from roary. It turned out that prokka uses a tool called Prodigal, and the sequence size threshold it uses is 20kb. So eventually, I can't get GFF3 files from prokka to run roary. Now I'm left with one of two solutions:

1/ look for an other source of GFF3 files to run roary and get my core genomes. 2/ Try something else (other than roary) to generate core genomes.

If you happen to know other tools or scripts I can use in my case please let me know. Many thanks!!

ADD COMMENTlink modified 4 weeks ago by Mensur Dlakic7.2k • written 4 weeks ago by Salsabil 0

Separately from my other post, the concept of core genes in viruses does not make as much sense as in prokaryotes. Most viruses carry a minimal number of genes to begin with - usually only genes required to regulate their own or host transcription, and to replicate and rebuild capsids. Can't imagine that there will be much difference between somewhat related viruses of similarly-sized genomes.

ADD REPLYlink written 4 weeks ago by Mensur Dlakic7.2k

Exactly, I was expecting a potentially high number of core genes. Especially that these viruses are from the same genus (HIV and SIV). As for the reason I want to work with core genome rather than whole genome, it's because those are highly mutable viruses.

ADD REPLYlink written 4 weeks ago by Salsabil 0
gravatar for Mensur Dlakic
4 weeks ago by
Mensur Dlakic7.2k
Mensur Dlakic7.2k wrote:

I think there could be other options. You can run prodigal with -p meta (or --metagenome if using prokka), in which case it will allow genomes smaller than 20Kb. Or change the source code of prodigal to allow less than 20Kb and recompile.

This tool will find all single-copy marker genes shared between multiple genomes, but it also uses prodigal:

ADD COMMENTlink written 4 weeks ago by Mensur Dlakic7.2k

Thanks for your feedback. Actually, I found out that when genomes are smaller than 100kb, prokka puts prodigal in "meta" mode automatically. But then, it switches translation tables from 1 (the standard table) to 4. As a result, annotation differs, it doesn't detect all CDS within the genome. For one sequence with 10 CDS in NCBI, I got only 6 with prokka.

ADD REPLYlink written 4 weeks ago by Salsabil 0

Update: I ended up retrieving GFF3 annotation files directly from NCBI, so the annotation problem is solved at this point. But I'm still getting 0 cores genes in roary results which I don't understand. Because these viruses share most of their genes (if not all of them)!

ADD REPLYlink modified 29 days ago • written 29 days ago by Salsabil 0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2079 users visited in the last hour