Question

USEARCH for orthologous genes identification

1

Entering edit mode

9.3 years ago

biolab ★ 1.4k

Hi everyone,

I am using USEARCH to identify orthologous genes between two species. I set evalue cutoff 1e-5 and top hit option. However, I am suspicious of this in silico method, I show an example as below. Is my method somewhere wrong? THANKS a lot for any of your suggestions!

Query >LOC_Os07g04960_1
 Score     Evalue   %Id    QueryLo-Hi(Un)   TargetLo-Hi(Un)  Target
   228      3e-19   42%         39-149(2)       268-383(18)  AT5G15780_1

Qry  39 PAAAIPAVPAMPKPTIPTIVPAVTLPPIPAVPKVTLPPMPAIPTVPAVTMPPMPAVPAVPAVTLPPMPAVPTVPPNTVV 117
| . ||     | | ||.| |  |||| | :|   .|||.| ||| |  |:| .| .|  |  ||||.| :||.||  |.
Tgt 268 PPSIIP-----PNPLIPSI-PTPTLPPNPLIPSPPSLPPIPLIPTPP--TLPTIPLLPTPPTPTLPPIPTIPTLPPLPVL 339

Qry 118 VPAAVV--PALP------KVALPPMAAVPNVP----MPFLAPPP 149
         |  :|  |.||       | |||.  .| :|    .| : | |
Tgt 340 PPVPIVNPPSLPPPPPSFPVPLPPVPGLPGIPPVPLIPGIPPAP 383

124 cols, 52 ids (41.9%), 21 gaps (16.9%), score 228.0 (92.4 bits), Evalue 2.5e-19

orthologous blast usearch • 2.4k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by biolab ★ 1.4k

1

Entering edit mode

In the example you posted, it seems most of the alignment is in the low complexity region. First, USEARCH might not be a good choice if you want to identify significantly diverged sequences. From the manual

Recommended identity ranges
USEARCH is effective at identities of ~50% and above for proteins and ~75% and above for nucleotides.

See if you can avoid the problem in the example you posted by using "seg" for masking the repetitive and low-complexity regions instead of the default method USEARCH uses. Is there any other specific reason to doubt the accuracy of USEARCH?

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Siva ★ 1.9k

0

Entering edit mode

Hi Siva, thank you very much for your reply. Your comment is very helpful. I need to set identity cutoff.

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by biolab ★ 1.4k

0

Entering edit mode

Depending on your species of interest, you may want to have a look at the orthologues from the Comparative Genomics analyses in Ensembl.

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Denise CS ★ 5.2k

0

Entering edit mode

Thank you Denise, your comments are helpful.

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by biolab ★ 1.4k