Question: Check programmaticaly if a protein is mitochondrial with annotations
7 weeks ago by
lumal2980 wrote:

Hi, I have a fasta file with more than 38000 protein sequences infered from a genome of Diplonema. All sequences have an ID and an annotation, but the ID is not referenced in any database. I need to check which protein is mitochondrial with the annotations.

Here is an example, with an ID and an annotation:

XXXXX12345 Succinyl-CoA ligase [ADP-forming] subunit beta

I know this one is mitochondrial, because I also used BLAST to check the similarities with the mitochondrial proteins from another organism. But I only know it, because I checked with Google what a "Succinyl-CoA ligase" was amongst the little subset (30 proteins) I found with BLAST.

But is there a way to check programmaticaly each annotations in the fasta file to see if it corresponds to a mitochondrial protein? Which ressource(s) can I use to at least see if proteins are mitochondrial?

Thanks in advance.

ADD COMMENTlink modified 7 weeks ago by Mensur Dlakic6.6k • written 7 weeks ago by lumal2980


The only thing that I can think of and you can do, but not sure if is feasible neither the best option, is to build a mitochondrial database, and then map/align all the Diplonema genes/proteins against this database, and the ones that align against it, i.e., higher percent identity and lower e-value, will be assigned/annotated as mitochondrial. I believed there is a human mitochondrial database. Other thing that is possible, but I don't think that will work well, is to assigned/annotate a protein/gene as mitochondrial based on their gene/protein name (though you can have mitochondrial genes without annotation), comparing each gene/protein name against a list of mitochondrial genes/proteins.


ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by antonioggsousa1.4k

Hi Antonio, Thank you for your advice! I actually did what you suggest, but with an organism closer in the evolution tree, called Andalucia. I took the proteins that I was sure was mitochondrial, made a database with Blast and use my proteins against it. By doing this, I found proteins like the one I talked about in my post. Thank you again, because it comforts me in what I'm doing!

ADD REPLYlink written 6 weeks ago by lumal2980
7 weeks ago by
Mensur Dlakic6.6k
Mensur Dlakic6.6k wrote:

It is not a perfect solution, but you can try predicting protein localization from sequence. For example:

Most of them should work well with mitochondrial proteins.

ADD COMMENTlink written 7 weeks ago by Mensur Dlakic6.6k

Thank you Mensur for your answer. I already used 3 tools to predict the sequences. I used TargetP, Mitofates and PredSL. It's really hard to make a decision based on the results you get from these tools because they don't give the same results. If I look a positive prediction from the 3 tools together, I obtain more than 600 proteins over 38000, and if I look for a positive prediction from at least one tool, I have more than 4000 sequences. How can I decide then which to chose? What I did was taking mitochondrial proteins from another closed organism called Andalucia and check the accuracy of the tools. I had 33 proteins and it predicted 32 of them when I look for a positive prediction from at least one tool. So, as you said, it's not perfect, but it can give me a good idea perhaps. I saw new tools in the links you gave me, I will maybe try some. Thank you again for your help!

ADD REPLYlink written 6 weeks ago by lumal2980
Please log in to add an answer.


