Question: How to get the proteic sequence of a transcript variant (Ensembl Perl API) ?
0
gravatar for nathan.alary
4.5 years ago by
nathan.alary0 wrote:

Hello,


I'm using the Ensembl Perl API provided by Ensembl in order to collect informations about a set of genes (ENST IDs), all the transcripts from those genes (ENSG IDs), and all the variants from those transcripts (rs/cos/... IDs).
I'm actually looking for an efficient way to get the proteic sequences translated from each transcript variant (variations included), ie one proteic sequence per transcript variation.

Being unable to directly get the whole proteic sequence neither the nucleotidic sequence of a transcript variation in order to translate it (maybe I missed something), I had the idea of using a Slice object between the start and end positions of each transcript then translate the returned sequences based on the location and type of each transcript variation. But I realized it was tedious to handle frameshift variants and some variation cases.

That's the reason why I'm looking for a more direct and efficient way to get those proteic sequences.
If someone has the answer or any suggestion, I would greatly appreciate it.

Regards, 
Nathan

ADD COMMENTlink modified 3.4 years ago by Emily_Ensembl19k • written 4.5 years ago by nathan.alary0
1
gravatar for Emily_Ensembl
3.4 years ago by
Emily_Ensembl19k
EMBL-EBI
Emily_Ensembl19k wrote:

There's no single API call that does what you need, but there are a couple of VEP plugins that you can cannibalise. The ProteinSeqs plugin creates a new protein sequence based on a missense variants, while the Downstream plugin gives you the sequence downstream of a frameshift.

ADD COMMENTlink written 3.4 years ago by Emily_Ensembl19k
0
gravatar for francisco.abad
3.4 years ago by
francisco.abad0 wrote:

Hi Nathan, I'm looking for some function that performs that task directly in ensembl perl API, but meanwhile I'm using the following functions:

# param 0 -> TrancscriptVariationAllele object.
# return the coding sequence of the transcript with the mutation.
sub get_variation_cds_seq{
    my $tva = $_[0];
    # translateable_seq returns the coding part of the transcript
    # (it removes introns and 5' and 3' utr)
    my $seq = $tva->transcript->translateable_seq;
    if (!defined($tva->transcript_variation->cds_start) || !defined($tva->transcript_variation->cds_end)){
        print "ERROR" . $tva->transcript_variation->variation_feature->variation_name . " " . $tva->transcript->display_id . "\n";
    }
    # Variation position starting at the begining of coding sequence.
    my $variation_start = $tva->transcript_variation->cds_start - 1;
    my $variation_end = $tva->transcript_variation->cds_end - 1;
    # If is a deletion, feature_seq is '-', so we will use '' instead
    # to build the final sequence.
    my $feature_seq = $tva->feature_seq eq "-" ? "" : $tva->feature_seq;
    substr($seq, $variation_start, $variation_end - $variation_start + 1) = $feature_seq;

    return $seq;
}

# param 0 -> TrancscriptVariationAllele object.
# return the sequence of the transcript with the mutation, including 5' and 3'.
sub get_variation_cdna_seq{
    my $tva = $_[0];
    # seq contains 5' and 3' regions.
    my $seq = $tva->transcript->seq->seq;
    if (!defined($tva->transcript_variation->cdna_start) || !defined($tva->transcript_variation->cdna_end)){
        print "ERROR" . $tva->transcript_variation->variation_feature->variation_name . " " . $tva->transcript->display_id . "\n";
    }
    # Variation position counting utr regions.
    my $variation_start = $tva->transcript_variation->cdna_start - 1;
    my $variation_end = $tva->transcript_variation->cdna_end - 1;
    # If is a deletion, feature_seq is '-', so we will use '' instead
    # to build the final sequence.
    my $feature_seq = $tva->feature_seq eq "-" ? "" : $tva->feature_seq;
    substr($seq, $variation_start, $variation_end - $variation_start + 1) = $feature_seq;

    return $seq;
}

Suggestions will be appreciated. It would be great that this functionality were implemented in perl api correctly because my functions sometimes fails due to start or end of the variation is not defined.

Regards, Fran.

ADD COMMENTlink written 3.4 years ago by francisco.abad0

You should post this as a new question in a separate thread.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by genomax74k

I think I'm giving my solution to the original question. I'm just advising that my answer could be improved by other user and I'm expecting for that. I'm new in biostar so sorry if I've made a mistake.

Cheers, Fran.

ADD REPLYlink written 3.4 years ago by francisco.abad0

I did not realize that you were actually offering a solution since the sentence at the beginning of your post is unclear. My apologies.

ADD REPLYlink written 3.4 years ago by genomax74k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1842 users visited in the last hour