cDNA to protein conversion
3
0
Entering edit mode
3.7 years ago
yoosefyud ▴ 40

Hello we have identified new mutations in genomic sequence of ABCD1 gene.I have cDNA mutation results which has mentioned the nucleotides changes like c.562 C>A. we need to convert our results to protein sequence in order to perform multiple bioinformatic analysis on its protein.can you please introduce my a database shows that a mutation causes a frameshift or not? Also please introduce me a database which shows proper variant nomenclature. I Know HGVS, it only offers instructions. is there a database which can help in variant nomenclature in case of proteins? I will be so pleased if you could introduce me a book or article which has instructions about DNA to protein conversion. I am really stuck in this simple conversion!!!!

Protein • 2.7k views
2
Entering edit mode

Good start would be to find out which transcript was used to describe the nucleotide change. Then you can choose corresponding transcript for example from Ensembl and download its sequence from the box on the left (Sequence > cDNA/Protein) and modify corresponding nucleotide). Good tools for sequence translation is SMS. There is also TransVar tool, which can help you to determine which transcript was possibly used and also convert the nucleodite change to corresponding genomic coordinate. (However for ABCD1:c.562C>A it is not giving any result, so I am not sure if it is just an example or if it is a problem with TransVar or if there was some mistake in describing the nucleotide change, so you should better check it) To predict what the mutation does to your protein, you can use various tools, for example PredictSNP2 or other tools mentioned before (VEP, SnpEff or Annovar). Well, hope it helps...

0
Entering edit mode

Thanks for your answer, what about protein nomenclature? is there a database which can help me with the best the name of proteins? Yes the mutation i gave was just an example, c.1978 C>T is one of our identified mutations in ABCD1 gene.

1
Entering edit mode

It is unclear which data you have, please elaborate.

0
Entering edit mode

Thanks for your attention, I have edited as you wish.

1
Entering edit mode

We still don't know in which format your data is. Do you have vcf files? If so the most straightforward way is to annotate your results with tools like VEP, SnpEff or Annovar.

0
Entering edit mode

I have cDNA mutation results which has mentioned the nucleotides changes like c.562 C>A.

1
Entering edit mode

So you just have list of changes which looks like this?

c.562 C>A
t.712 C>G
etc.


Do you have FASTA sequence of the unmutated gene?

0
Entering edit mode

yes,i only have the list of changes. I can find FASTA sequence of unmutated gene from Ensemble database.

0
Entering edit mode

Do you know to which annotation these cDNA mutations correspond?

It would really be a lot easier if you could get the original data.

0
Entering edit mode

Those coordinates look useless without a transcript ID. (And I hope you are not broadcasting real novel results here)

0
Entering edit mode

I have no access to its NGS results data , I only know which nucleotides have been changed.

3
Entering edit mode
3.7 years ago
Joe 20k

Here's a simple workflow for converting all your sequences (I have no idea about variant nomeclature databases or any of that).

You can use your WT sequence, and my code here to generate all of the FASTA sequences that correspond to your mutations.

You'll need a 'map file' which lists the Sequence ID, and the switch that's made:

 SequenceID,A123B
SequenceID2,X234Y


You'll need to convert your format c.562 C>A to SequenceID,C562A (for example).

It will generate a mutated fasta sequence for each input sequence/mutation.

You can then use this BioPython snippet to read in a file of mutated sequences and translate them to proteins.

from Bio import SeqIO
r = SeqIO.parse('single.fasta' , 'fasta')
for s in r:
s.translate()
SeqRecord(seq=Seq('MSTTADQIAVQYPIPTYRFVVTIGDEQMCFQSVSGLDISYDTIEYRDGVGNWLQ...FH*', HasStopCodon(ExtendedIUPACProtein(), '*')), id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=[])

1
Entering edit mode

Thanks a lot for your help.

1
Entering edit mode
3.7 years ago

If you have cDNA or complete transcript, you can just use expasy translate and choose the correct single frame from the 3 frames result from sense strand.

0
Entering edit mode

Thanks, yes i have used Expasy . But i don't know that i'm using it correctly or not! First, i insert my wild sequences and then it gives me three frames, then i check the correct frame with ensemble database, next i insert my mutant sequence and then choose the same frame and compare it to the wild sequence... am i doing right?

1
Entering edit mode
3.7 years ago

You can also use tblastx to search your cDNA query using translated nucleotide database to findout any paralogs or orthologs that match with your wild or mutant sequences. https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=tblastx&PAGE_TYPE=BlastSearch&BLAST_SPEC=&LINK_LOC=blasttab