Question

Finding a gene in a whole genome

0

Entering edit mode

3.6 years ago

Idania • 0

Hi!

Im trying to find an enzyme gene in an Aspergillus genome which its not fully annotated. I made a DB with another protein sequences of that same enzyme from another Aspergillus species and after that I ran a tblastn and got my results in a .txt but I'm lost in the next step. I don't know how to handle the whole genome data. I've been trying to translate the genome in all the posible reading frames and then search my sequence with the grep command but I don't know if there is a more efficient way.

I'm using Linux. I'm being as clear as I can because i'm new in this topic, I hope you can help me.

Thanks!

gene grep tblastn • 948 views

ADD COMMENT • link 3.6 years ago by Idania • 0

1

Entering edit mode

You could try a de novo annotation ~~with PROKKA~~ and then check the proteins of that annotation against your related protein. You could also find more of those protein sequences and build an hmm from them and use that to search against the prokka-proteins (HMMER software). You could also try a "liftover approach" ~~(with RATT)~~.

As for your tblastn results, how do they look like? You'd want high scoring hsps that cover a good portion of the query sequence. Did you produce tabular output? Maybe show some of the first hits.

edit: Just noting that I made the prokka/ratt recommendations thinking we're dealing with a Prokaroyte. Since Aspergillus is a Eukaryote, these bacterial tools obviously don't make sense.

ADD REPLY • link 3.6 years ago by cschu181 ★ 2.8k

0

Entering edit mode

grep is for sure the least efficient way to do this (it's even the wrong way , because grep only matches exact matches, so as soon as there is a single AA difference between the 'genome gene' and your query gene it will fail to find it)

As suggested by cschu181 , doing a genome annotation is the better approach though is quite labour intensive. If you are only looking for a single (or few) genes it might be more efficient to dig in manually. Display the genome with the blast hits in a genome browser (IGV, GenomeView, Artemis, Apollo, ... ) and annotate the gene of interest .

ADD REPLY • link 3.6 years ago by lieven.sterck 15k

0

Entering edit mode

though is quite labour intensive

I don't know, running prokka on a 30 Mb genome shouldn't be that bad (or do you mean in terms of installation/getting the environment right?). The liftover/reference-based annotation with RATT would require a bit more effort (getting the reference data in proper format and patching RATT so that it works with newer perl versions), but even that isn't too bad.

edit: this is all nonsense as based on the assumption of processing a Prokaryote genome

ADD REPLY • link 3.6 years ago by cschu181 ★ 2.8k

0

Entering edit mode

well, yes there are the technical (potential) issues and then there are also the getting all your data together (rna-seq? proteins? parameter tuning, ... )

Running an off the shelf thing like PROKKA is feasible (bacterial is less cumbersome than eukaryotic) but still don't underestimate the whole process of doing genome annotation, I'm talking from experience here. A thorough(!) genome annotation is still a considerable effort.

ADD REPLY • link 3.6 years ago by lieven.sterck 15k

0

Entering edit mode

Argh. It is a fungus. Of course, you're right. For some reason I thought this was a bacterial genome.

ADD REPLY • link 3.6 years ago by cschu181 ★ 2.8k

0

Entering edit mode

Try Blast with genome of interest against other Aspergillli genomes
If blast server (NCBI/EBI/etc) doesn’t have the genome and annotation information of your interest and if you have them,
- Index the reference organism protein sequences with diamond/rapsearch (input is fasta sequences with annotation headers)
- do a rapsearch/diamond search
- look at the output (summary would be in .m8 files)

Please search fungal genome specific blast servers.

ADD REPLY • link 3.6 years ago by cpad0112 21k