Question: how to find gene IDs from sequences
3.0 years ago by
Iran, Islamic Republic Of
nazaninhoseinkhan210 wrote:

Dear all,

I have a file including thousands of sequences for a special plant species.

I want to obtain their corresponding gene IDs. I know that the best way of doing this is to perform blast, but since the number of sequences is huge, I am looking for a way to do this automatically.

Do you think all against all blast is a good way? if so can you please give me a clue to do it?

Thank you in advance


geneid sequence • 1.2k views
I see you can run quite easily blast for several sequences at one; I'm not sure how scalable this is

You can also limit your search to specific organisms to find just matches for plants.  


3.0 years ago by
United States
tomc70 wrote:

Assuming you are working with nucleotides, have a local blast installation and your gene reference sequences formatted as a blast database, This could be start of a solution to creating links between your sequences to and known gene IDs.

blastn -db reference.bdb -query file_with all sequences.nt -out result.alignment

But then you will still need to tune your "expect", pick an output format,  and eventually interpret the resulting alignments which unless you have a fairly arbitrary policy, is going to be the real work.

