Question: Check programmaticaly if a protein is mitochondrial with annotations
0
gravatar for lumal29
7 weeks ago by
lumal2980
Montreal
lumal2980 wrote:

Hi, I have a fasta file with more than 38000 protein sequences infered from a genome of Diplonema. All sequences have an ID and an annotation, but the ID is not referenced in any database. I need to check which protein is mitochondrial with the annotations.

Here is an example, with an ID and an annotation:

XXXXX12345 Succinyl-CoA ligase [ADP-forming] subunit beta

I know this one is mitochondrial, because I also used BLAST to check the similarities with the mitochondrial proteins from another organism. But I only know it, because I checked with Google what a "Succinyl-CoA ligase" was amongst the little subset (30 proteins) I found with BLAST.

But is there a way to check programmaticaly each annotations in the fasta file to see if it corresponds to a mitochondrial protein? Which ressource(s) can I use to at least see if proteins are mitochondrial?

Thanks in advance.

ADD COMMENTlink modified 7 weeks ago by Mensur Dlakic6.6k • written 7 weeks ago by lumal2980

Hi,

The only thing that I can think of and you can do, but not sure if is feasible neither the best option, is to build a mitochondrial database, and then map/align all the Diplonema genes/proteins against this database, and the ones that align against it, i.e., higher percent identity and lower e-value, will be assigned/annotated as mitochondrial. I believed there is a human mitochondrial database. Other thing that is possible, but I don't think that will work well, is to assigned/annotate a protein/gene as mitochondrial based on their gene/protein name (though you can have mitochondrial genes without annotation), comparing each gene/protein name against a list of mitochondrial genes/proteins.

António

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by antonioggsousa1.4k

Hi Antonio, Thank you for your advice! I actually did what you suggest, but with an organism closer in the evolution tree, called Andalucia. I took the proteins that I was sure was mitochondrial, made a database with Blast and use my proteins against it. By doing this, I found proteins like the one I talked about in my post. Thank you again, because it comforts me in what I'm doing!

ADD REPLYlink written 6 weeks ago by lumal2980
2
gravatar for Mensur Dlakic
7 weeks ago by
Mensur Dlakic6.6k
USA
Mensur Dlakic6.6k wrote:

It is not a perfect solution, but you can try predicting protein localization from sequence. For example:

Most of them should work well with mitochondrial proteins.

ADD COMMENTlink written 7 weeks ago by Mensur Dlakic6.6k

Thank you Mensur for your answer. I already used 3 tools to predict the sequences. I used TargetP, Mitofates and PredSL. It's really hard to make a decision based on the results you get from these tools because they don't give the same results. If I look a positive prediction from the 3 tools together, I obtain more than 600 proteins over 38000, and if I look for a positive prediction from at least one tool, I have more than 4000 sequences. How can I decide then which to chose? What I did was taking mitochondrial proteins from another closed organism called Andalucia and check the accuracy of the tools. I had 33 proteins and it predicted 32 of them when I look for a positive prediction from at least one tool. So, as you said, it's not perfect, but it can give me a good idea perhaps. I saw new tools in the links you gave me, I will maybe try some. Thank you again for your help!

ADD REPLYlink written 6 weeks ago by lumal2980
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1645 users visited in the last hour