How to get the proteic sequence of a transcript variant (Ensembl Perl API) ?
2
0
Entering edit mode
7.0 years ago

Hello,

I'm using the Ensembl Perl API provided by Ensembl in order to collect informations about a set of genes (ENST IDs), all the transcripts from those genes (ENSG IDs), and all the variants from those transcripts (rs/cos/... IDs).
I'm actually looking for an efficient way to get the proteic sequences translated from each transcript variant (variations included), ie one proteic sequence per transcript variation.

Being unable to directly get the whole proteic sequence neither the nucleotidic sequence of a transcript variation in order to translate it (maybe I missed something), I had the idea of using a Slice object between the start and end positions of each transcript then translate the returned sequences based on the location and type of each transcript variation. But I realized it was tedious to handle frameshift variants and some variation cases.

That's the reason why I'm looking for a more direct and efficient way to get those proteic sequences.
If someone has the answer or any suggestion, I would greatly appreciate it.

Regards,
Nathan

Ensembl Perl API proteic sequence • 1.9k views
1
Entering edit mode
5.9 years ago
Emily 23k

There's no single API call that does what you need, but there are a couple of VEP plugins that you can cannibalise. The ProteinSeqs plugin creates a new protein sequence based on a missense variants, while the Downstream plugin gives you the sequence downstream of a frameshift.

0
Entering edit mode
5.9 years ago

Hi Nathan, I'm looking for some function that performs that task directly in ensembl perl API, but meanwhile I'm using the following functions:

# param 0 -> TrancscriptVariationAllele object.
# return the coding sequence of the transcript with the mutation.
sub get_variation_cds_seq{
my $tva =$_[0];
# translateable_seq returns the coding part of the transcript
# (it removes introns and 5' and 3' utr)
my $seq =$tva->transcript->translateable_seq;
if (!defined($tva->transcript_variation->cds_start) || !defined($tva->transcript_variation->cds_end)){
print "ERROR" . $tva->transcript_variation->variation_feature->variation_name . " " .$tva->transcript->display_id . "\n";
}
# Variation position starting at the begining of coding sequence.
my $variation_start =$tva->transcript_variation->cds_start - 1;
my $variation_end =$tva->transcript_variation->cds_end - 1;
# If is a deletion, feature_seq is '-', so we will use '' instead
# to build the final sequence.
my $feature_seq =$tva->feature_seq eq "-" ? "" : $tva->feature_seq; substr($seq, $variation_start,$variation_end - $variation_start + 1) =$feature_seq;

return $seq; } # param 0 -> TrancscriptVariationAllele object. # return the sequence of the transcript with the mutation, including 5' and 3'. sub get_variation_cdna_seq{ my$tva = $_[0]; # seq contains 5' and 3' regions. my$seq = $tva->transcript->seq->seq; if (!defined($tva->transcript_variation->cdna_start) || !defined($tva->transcript_variation->cdna_end)){ print "ERROR" .$tva->transcript_variation->variation_feature->variation_name . " " . $tva->transcript->display_id . "\n"; } # Variation position counting utr regions. my$variation_start = $tva->transcript_variation->cdna_start - 1; my$variation_end = $tva->transcript_variation->cdna_end - 1; # If is a deletion, feature_seq is '-', so we will use '' instead # to build the final sequence. my$feature_seq = $tva->feature_seq eq "-" ? "" :$tva->feature_seq;
substr($seq,$variation_start, $variation_end -$variation_start + 1) = $feature_seq; return$seq;
}


Suggestions will be appreciated. It would be great that this functionality were implemented in perl api correctly because my functions sometimes fails due to start or end of the variation is not defined.

Regards, Fran.

0
Entering edit mode

You should post this as a new question in a separate thread.

0
Entering edit mode

I think I'm giving my solution to the original question. I'm just advising that my answer could be improved by other user and I'm expecting for that. I'm new in biostar so sorry if I've made a mistake.

Cheers, Fran.

0
Entering edit mode

I did not realize that you were actually offering a solution since the sentence at the beginning of your post is unclear. My apologies.