I have to verify a protein sequence from 2 insect species.
The genome is not very clean for one of them. For the second species, the genome is not available.
What is the standard process to achieve that?
I think I have to :
compare my protein with orthologs from others species, Length of the protein,... That could provide a warning: false CDS? ORF prediction in the genome. For the second species, no genome...
I think RACE-PCR and getting a full length consensus cDNA is the best approach to date to get reliable gene models. This works with or without a reference genome.
Alternatively, de-novo peptide sequencing but I think this is more complicated.
If validation means only having similar structure and an antibody is available, then western blot should do it.
Ok, but IMHO it is not possible to get something close to counting as 'verified' from using in-silico methods only. You can try all homology based methods (blast, exonerate), you can make a phylogeny and see if the proteins fit, you can check the presence of the correct domain structure using InterProScan, and you can do 3D, 4D reconstruction using e.g. SwissModell or i-Tasser checking the results of TM-align. See e.g. this paper by Daly et al. This is as close as it gets, and still you will never get a validated sequence, because you could be missing that one little exon only PCR can give you.
I recommend to find a collaborator to do the experimental lab work with you, these are really routine tasks which give you good return on investment, and any stakeholder in your project should have a high interest on getting real validated sequences.
Thanks Michael,
Very useful information. I agree with you.
I will perform an alignment with closest species (drosophila) only to have an idea of difference in sequence length, mutation, ...That will give me some arguments to suggest a PCR.
Thanks Michael, but I can only use in-silico method at this time.
Ok, but IMHO it is not possible to get something close to counting as 'verified' from using in-silico methods only. You can try all homology based methods (blast, exonerate), you can make a phylogeny and see if the proteins fit, you can check the presence of the correct domain structure using InterProScan, and you can do 3D, 4D reconstruction using e.g. SwissModell or i-Tasser checking the results of TM-align. See e.g. this paper by Daly et al. This is as close as it gets, and still you will never get a validated sequence, because you could be missing that one little exon only PCR can give you.
I recommend to find a collaborator to do the experimental lab work with you, these are really routine tasks which give you good return on investment, and any stakeholder in your project should have a high interest on getting real validated sequences.
Thanks Michael, Very useful information. I agree with you. I will perform an alignment with closest species (drosophila) only to have an idea of difference in sequence length, mutation, ...That will give me some arguments to suggest a PCR.
best
Btw, if you let us know which genes we are talking about, it might be possible to help you even more.