Question: how to identify cores, accesory and unique sequences from proteomes
0
gravatar for kamel
16 months ago by
kamel30
kamel30 wrote:

Hi, I have 8 proteomes in multifasta format, the number of protein sequences between 13000 and 14000 sequences. I need to align these sequences to see the cores, the accessory and the unique sequences between these 8 proteomes. do you have a method to do this PLZ.

Four informations: I used proteinortho but I noticed that proteinortho gives only cores and accessory sequences.

Thank you in advance for your response and your help

sequence alignment genome • 427 views
ADD COMMENTlink modified 12 months ago by bioinfo1730 • written 16 months ago by kamel30

If you've already got your cores and accessories, you could cluster the proteins to some identity threshold (which presumably you already did with proteinortho), using something like PSI-CD-HIT. I'm not sure how well it scales to a dataset that large, but give it a try.

Any clusters you get with only a single member are your unique proteins.

ADD REPLYlink written 16 months ago by Joe14k

Excuse me, but I did not understand what you said. I used proteinortho and got a matrix (.txt file) that does not contain the unique sequences. do you have a method or tool that aligns and gives a matrix with unique, accessory and cores.

ADD REPLYlink written 16 months ago by kamel30

Not a single tool no - I am not aware of one from proteomes. Your task will probably require some scripting/coding of your own.

My suggestion is to keep the matrix you already have which gives you 2/3rds of what you asked for, and then cluster your sequences using the CD-HIT program to find unique sequences.

I don't know how else I can explain it...

ADD REPLYlink written 16 months ago by Joe14k
1
gravatar for bioinfo17
12 months ago by
bioinfo1730
bioinfo1730 wrote:

use the -singles option in proteinortho command

ADD COMMENTlink written 12 months ago by bioinfo1730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 911 users visited in the last hour