Entering edit mode
6.3 years ago
Hi, I have prepared a perl script to read the stable Ensembl IDs from a file and write the sequences and strands of them to separate files. The $slice->seq() function works fine and I get the sequences, but $slice->strand() always returns 1 even for the genes that are on the reverse strand, like ENSG00000227359. I really need to get the correct strand. Am I missing something here? I copy my script here. I would appreciate your help.
use Bio::EnsEMBL::Registry;
use strict;
use warnings;
# warn user (from perspective of caller)
use Carp;
# use nice English (or awk) names for ugly punctuation variables
use English qw(-no_match_vars);
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org', # alternatively 'useastdb.ensembl.org'
-user => 'anonymous'
);
my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Slice' );
my $files = 'strand_test.txt';
my $files_seq = 'sequence_test.txt';
# check if the file exists
if (-f $files) {
unlink $files
or croak "Cannot delete $files: $!";
}
# use a variable for the file handle
my $OUTFILE;
my $OUTFILE_seq;
# use the three arguments version of open
# and check for errors
open $OUTFILE, '>>', $files
or croak "Cannot open $files: $OS_ERROR";
open $OUTFILE_seq, '>>', $files_seq
or croak "Cannot open $files_seq: $OS_ERROR";
my $filename = "gene_list_test2.txt";
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
# my $slice = $slice_adaptor->fetch_by_gene_stable_id( 'ENSG00000099889', 0 );
# print "to the loop\n";
while (my $row = <$fh>) {
my $slice;
chomp $row;
print "$row\n";
print "ine\n";
# my $red_row = substr $row, 0, -1;
# print "$red_row\n";
eval{
$slice = $slice_adaptor->fetch_by_gene_stable_id( "$row", 0 );
};
if (not ($@)) {
print { $OUTFILE } $slice->strand()
or croak "Cannot write to $files: $OS_ERROR";
print { $OUTFILE } "\n"
or croak "Cannot write to";
print { $OUTFILE_seq } $slice->seq()
or croak "Cannot write to $files_seq: $OS_ERROR";
print { $OUTFILE_seq } "\n"
or croak "Cannot write to";
}
else {
print { $OUTFILE } "NOT THERE!\n"
or croak "Cannot write to";
print { $OUTFILE_seq } "NOT THERE!\n"
or croak "Cannot write to";
print "----------------\n"
}
print $slice->strand();
print "\n+++++++++++++++++++\n";
}
print "done\n";
close $OUTFILE
or croak "Cannot close $files: $OS_ERROR";
close $OUTFILE_seq
or croak "Cannot close $files_seq: $OS_ERROR";
Hello,
I don't have a solution for you, as I do not use the perl api until now. But I found this statement here:
fin swimmer
Thank you for your time. Yes, this sentence initiated my problem which itself is confusing because this is how I understood this sentence: "All the returned sequences are assumed to be on the forward strand and if they are on the reverse strand, the reverse complement of the sequence will be returned as the sequence" and this is why I needed the strand to get the correct sequence for the ones on the reverse strand. After all, there should be some way to get the strand of the genes per se, even if the sequences are as they should be. Thank you again for your comment.