Question: How to find a specific gene in a assembled genome that is not annotated?
0
gravatar for christian_jpg2
13 months ago by
christian_jpg20 wrote:

I have the an assembled genome (contigs) of an insect that is not yet annotated. There are some genes of interest (such as actin) that I want to know if they are present in that genome. What can I do

sequence assembly gene • 686 views
ADD COMMENTlink modified 13 months ago by toheitka230 • written 13 months ago by christian_jpg20
3
gravatar for genomax
13 months ago by
genomax68k
United States
genomax68k wrote:

You can take the sequence of genes of interest from the closest related species that you can find in GenBank and then you can use blat (or blast) to search against your contigs.

ADD COMMENTlink written 13 months ago by genomax68k

Thanks.

However, I already try that with a sequence of the closest relative that I can find (they are from the same subfamily), and I do not get any hit. I do not know if their sequences are that different or if I am doing something wrong

ADD REPLYlink written 13 months ago by christian_jpg20

what kind of sequences are you using as input? nucleotide? protein? and related: what kind of blast are you running?

ADD REPLYlink written 13 months ago by lieven.sterck5.3k

christian_jpg2 : As has been suggested in this thread you could try tblastn if plain blastn did not work.

You also have to consider the possibility that if you did the blast search right and did not find a hit for something that should be present then your assembly could be of poor quality. You could take your assembly and try blasting it against nr to see if you get reasonable/contiguous hits.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax68k
2
gravatar for toheitka
13 months ago by
toheitka230
Germany/Dresden
toheitka230 wrote:

It looks to me, as if you should follow this thread "Building Hidden Markov Model (HMM) for proteins" closely, as both of you want similar things.

As for HMMs of your proteins, you could retrieve them using PFAM. The actin page, for example, is here: https://pfam.xfam.org/family/PF00022

Then, you can follow this EBI tutorial to retrieve the HMM from PFAM.

Actually, what I would do is very crude and a bit dirty: I would translate my genomes in all frames and check with the downloaded HMM using HMMER, if I get a hit. It would be great, if HMMER could use protein HMMs to search DNA, but this functionality does not yet exist (I think).

ADD COMMENTlink written 13 months ago by toheitka230
2

running a simple tblastn with protein against the genome will be more then sufficient here.

For more specific (or really distantly related species == very low sequence conservation), your given approach will indeed be the appropriate one.

ADD REPLYlink written 13 months ago by lieven.sterck5.3k
1

Yes, lieven.sterck is right, obviously. I had assumed, genomax already did that. Upon re-reading, I saw, that probably BLASTn has been used, so I agree, tBLASTn would be the keyword.

ADD REPLYlink written 13 months ago by toheitka230
1

(and I am biased because I am in love with HMMs...)

ADD REPLYlink written 13 months ago by toheitka230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1560 users visited in the last hour