how to identify cores, accesory and unique sequences from proteomes
1
0
Entering edit mode
5.9 years ago
kamel ▴ 70

Hi, I have 8 proteomes in multifasta format, the number of protein sequences between 13000 and 14000 sequences. I need to align these sequences to see the cores, the accessory and the unique sequences between these 8 proteomes. do you have a method to do this PLZ.

Four informations: I used proteinortho but I noticed that proteinortho gives only cores and accessory sequences.

Thank you in advance for your response and your help

alignment sequence genome • 1.2k views
ADD COMMENT
0
Entering edit mode

If you've already got your cores and accessories, you could cluster the proteins to some identity threshold (which presumably you already did with proteinortho), using something like PSI-CD-HIT. I'm not sure how well it scales to a dataset that large, but give it a try.

Any clusters you get with only a single member are your unique proteins.

ADD REPLY
0
Entering edit mode

Excuse me, but I did not understand what you said. I used proteinortho and got a matrix (.txt file) that does not contain the unique sequences. do you have a method or tool that aligns and gives a matrix with unique, accessory and cores.

ADD REPLY
0
Entering edit mode

Not a single tool no - I am not aware of one from proteomes. Your task will probably require some scripting/coding of your own.

My suggestion is to keep the matrix you already have which gives you 2/3rds of what you asked for, and then cluster your sequences using the CD-HIT program to find unique sequences.

I don't know how else I can explain it...

ADD REPLY
1
Entering edit mode
5.6 years ago
bioinfo17 ▴ 30

use the -singles option in proteinortho command

ADD COMMENT

Login before adding your answer.

Traffic: 2384 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6