Question: how to find gene IDs from sequences
0
gravatar for nazaninhoseinkhan
3.0 years ago by
Iran, Islamic Republic Of
nazaninhoseinkhan210 wrote:

Dear all,

I have a file including thousands of sequences for a special plant species.

I want to obtain their corresponding gene IDs. I know that the best way of doing this is to perform blast, but since the number of sequences is huge, I am looking for a way to do this automatically.

Do you think all against all blast is a good way? if so can you please give me a clue to do it?

Thank you in advance

Nazanin

geneid sequence • 1.2k views
ADD COMMENTlink modified 3.0 years ago by tomc70 • written 3.0 years ago by nazaninhoseinkhan210
1

Hi,

I see you can run quite easily blast for several sequences at one; http://www.ncbi.nlm.nih.gov/guide/howto/submit-mult-seq-blast/ I'm not sure how scalable this is

You can also limit your search to specific organisms to find just matches for plants.  

 

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by roy.granit680
0
gravatar for tomc
3.0 years ago by
tomc70
United States
tomc70 wrote:

Assuming you are working with nucleotides, have a local blast installation and your gene reference sequences formatted as a blast database, This could be start of a solution to creating links between your sequences to and known gene IDs.

blastn -db reference.bdb -query file_with all sequences.nt -out result.alignment

But then you will still need to tune your "expect", pick an output format,  and eventually interpret the resulting alignments which unless you have a fairly arbitrary policy, is going to be the real work.

ADD COMMENTlink written 3.0 years ago by tomc70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 622 users visited in the last hour