Question: USEARCH for orthologous genes identification
1
gravatar for biolab
5.8 years ago by
biolab1.2k
biolab1.2k wrote:

Hi everyone,

I am using USEARCH to identify orthologous genes between two species.  I set evalue cutoff 1e-5 and top hit option.  However, I am suspicious of this in silico method, I show an example as below,  Is my method somewhere wrong?  THANKS a lot for any of your suggestions!

Query >LOC_Os07g04960_1
 Score     Evalue   %Id    QueryLo-Hi(Un)   TargetLo-Hi(Un)  Target
   228      3e-19   42%         39-149(2)       268-383(18)  AT5G15780_1

Qry  39 PAAAIPAVPAMPKPTIPTIVPAVTLPPIPAVPKVTLPPMPAIPTVPAVTMPPMPAVPAVPAVTLPPMPAVPTVPPNTVV 117
| . ||     | | ||.| |  |||| | :|   .|||.| ||| |  |:| .| .|  |  ||||.| :||.||  |.
Tgt 268 PPSIIP-----PNPLIPSI-PTPTLPPNPLIPSPPSLPPIPLIPTPP--TLPTIPLLPTPPTPTLPPIPTIPTLPPLPVL 339

Qry 118 VPAAVV--PALP------KVALPPMAAVPNVP----MPFLAPPP 149
         |  :|  |.||       | |||.  .| :|    .| : | |
Tgt 340 PPVPIVNPPSLPPPPPSFPVPLPPVPGLPGIPPVPLIPGIPPAP 383

124 cols, 52 ids (41.9%), 21 gaps (16.9%), score 228.0 (92.4 bits), Evalue 2.5e-19

 

orthologous blast usearch • 1.6k views
ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by biolab1.2k
1

In the example you posted, it seems most of the alignment is in the low complexity region. First, USEARCH might not be a good choice if you want to identify significantly diverged sequences. From the manual

Recommended identity ranges
USEARCH is effective at identities of ~50% and above for proteins and ~75% and above for nucleotides.

See if you can avoid the problem in the example you posted by using "seg" for masking the repetitive and low-complexity regions instead of the default method USEARCH uses. Is there any other specific reason to doubt the accuracy of USEARCH?

ADD REPLYlink written 5.8 years ago by Siva1.7k

Hi Siva, thank you very much for your reply.  Your comment is very helpful.  I need to set identity cutoff.

ADD REPLYlink written 5.8 years ago by biolab1.2k

Depending on your species of interest, you may want to have a look at the orthologues from the Comparative Genomics analyses in Ensembl.

ADD REPLYlink written 5.8 years ago by Denise CS5.1k

Thank you Denise, your comments are helpful.

ADD REPLYlink written 5.8 years ago by biolab1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2038 users visited in the last hour