How to get the correct strand of the gene?
Entering edit mode
3.3 years ago

Hi, I have prepared a perl script to read the stable Ensembl IDs from a file and write the sequences and strands of them to separate files. The $slice->seq() function works fine and I get the sequences, but $slice->strand() always returns 1 even for the genes that are on the reverse strand, like ENSG00000227359. I really need to get the correct strand. Am I missing something here? I copy my script here. I would appreciate your help.

use Bio::EnsEMBL::Registry;

use strict;
use warnings;

# warn user (from perspective of caller)
use Carp;

# use nice English (or awk) names for ugly punctuation variables
use English qw(-no_match_vars);

my $registry = 'Bio::EnsEMBL::Registry';

    -host => '', # alternatively ''
    -user => 'anonymous'

my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Slice' );

my $files = 'strand_test.txt'; 
my $files_seq = 'sequence_test.txt';

# check if the file exists
if (-f $files) {
    unlink $files
        or croak "Cannot delete $files: $!";

# use a variable for the file handle 

my $OUTFILE_seq;
# use the three arguments version of open
# and check for errors
open $OUTFILE, '>>', $files
    or croak "Cannot open $files: $OS_ERROR";

open $OUTFILE_seq, '>>', $files_seq
    or croak "Cannot open $files_seq: $OS_ERROR";

my $filename = "gene_list_test2.txt";
open(my $fh, '<:encoding(UTF-8)', $filename)
  or die "Could not open file '$filename' $!";

# my $slice = $slice_adaptor->fetch_by_gene_stable_id( 'ENSG00000099889', 0 );
# print "to the loop\n";

while (my $row = <$fh>) {
  my $slice;
  chomp $row; 
  print "$row\n";
  print "ine\n";
  # my $red_row = substr $row, 0, -1;
  # print "$red_row\n";
  $slice = $slice_adaptor->fetch_by_gene_stable_id( "$row", 0 );
  if (not ($@)) { 
    print { $OUTFILE } $slice->strand()
            or croak "Cannot write to $files: $OS_ERROR";
    print { $OUTFILE } "\n"
            or croak "Cannot write to";

    print { $OUTFILE_seq } $slice->seq()
            or croak "Cannot write to $files_seq: $OS_ERROR";
    print { $OUTFILE_seq } "\n"
            or croak "Cannot write to";
  else {
    print { $OUTFILE } "NOT THERE!\n"
            or croak "Cannot write to";

    print { $OUTFILE_seq } "NOT THERE!\n"
            or croak "Cannot write to";

    print "----------------\n"
  print $slice->strand();
  print "\n+++++++++++++++++++\n";
print "done\n";

close $OUTFILE
    or croak "Cannot close $files: $OS_ERROR";

close $OUTFILE_seq
    or croak "Cannot close $files_seq: $OS_ERROR";
Ensembl Core API gene rna perl • 834 views
Entering edit mode


I don't have a solution for you, as I do not use the perl api until now. But I found this statement here:

Note that for historical reasons the fetch_by_gene_stable_id() method always returns a slice on the forward strand even if the gene is on the reverse strand.

fin swimmer

Entering edit mode

Thank you for your time. Yes, this sentence initiated my problem which itself is confusing because this is how I understood this sentence: "All the returned sequences are assumed to be on the forward strand and if they are on the reverse strand, the reverse complement of the sequence will be returned as the sequence" and this is why I needed the strand to get the correct sequence for the ones on the reverse strand. After all, there should be some way to get the strand of the genes per se, even if the sequences are as they should be. Thank you again for your comment.


Login before adding your answer.

Traffic: 1814 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6