Question: How to find a specific gene in a assembled genome that is not annotated?
0
gravatar for christian_jpg2
2.0 years ago by
christian_jpg20 wrote:

I have the an assembled genome (contigs) of an insect that is not yet annotated. There are some genes of interest (such as actin) that I want to know if they are present in that genome. What can I do

sequence assembly gene • 1.2k views
ADD COMMENTlink modified 2.0 years ago by toheitka230 • written 2.0 years ago by christian_jpg20
3
gravatar for genomax
2.0 years ago by
genomax83k
United States
genomax83k wrote:

You can take the sequence of genes of interest from the closest related species that you can find in GenBank and then you can use blat (or blast) to search against your contigs.

ADD COMMENTlink written 2.0 years ago by genomax83k

Thanks.

However, I already try that with a sequence of the closest relative that I can find (they are from the same subfamily), and I do not get any hit. I do not know if their sequences are that different or if I am doing something wrong

ADD REPLYlink written 2.0 years ago by christian_jpg20

what kind of sequences are you using as input? nucleotide? protein? and related: what kind of blast are you running?

ADD REPLYlink written 2.0 years ago by lieven.sterck7.8k

christian_jpg2 : As has been suggested in this thread you could try tblastn if plain blastn did not work.

You also have to consider the possibility that if you did the blast search right and did not find a hit for something that should be present then your assembly could be of poor quality. You could take your assembly and try blasting it against nr to see if you get reasonable/contiguous hits.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax83k
2
gravatar for toheitka
2.0 years ago by
toheitka230
Germany/Dresden
toheitka230 wrote:

It looks to me, as if you should follow this thread "Building Hidden Markov Model (HMM) for proteins" closely, as both of you want similar things.

As for HMMs of your proteins, you could retrieve them using PFAM. The actin page, for example, is here: https://pfam.xfam.org/family/PF00022

Then, you can follow this EBI tutorial to retrieve the HMM from PFAM.

Actually, what I would do is very crude and a bit dirty: I would translate my genomes in all frames and check with the downloaded HMM using HMMER, if I get a hit. It would be great, if HMMER could use protein HMMs to search DNA, but this functionality does not yet exist (I think).

ADD COMMENTlink written 2.0 years ago by toheitka230
3

running a simple tblastn with protein against the genome will be more then sufficient here.

For more specific (or really distantly related species == very low sequence conservation), your given approach will indeed be the appropriate one.

ADD REPLYlink written 2.0 years ago by lieven.sterck7.8k
1

Yes, lieven.sterck is right, obviously. I had assumed, genomax already did that. Upon re-reading, I saw, that probably BLASTn has been used, so I agree, tBLASTn would be the keyword.

ADD REPLYlink written 2.0 years ago by toheitka230
1

(and I am biased because I am in love with HMMs...)

ADD REPLYlink written 2.0 years ago by toheitka230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1703 users visited in the last hour