Retrieving 16s rRNA from Whole genome sequence using PERL
Entering edit mode
2.2 years ago

I have a list of 100 Accession numbers of bacterial genomes. I want to retrieve the 16s rRNA sequence alone from their genbank record, in field "rRNA", product '16s ribosomal RNA' using PERL or MATLAB. Can you guide me to a book or tutorial for it? Thank you!

PERL 16s rRNA Genome GenBank • 749 views
Entering edit mode
2.2 years ago
zubenel ▴ 120

For learning Perl in general you may look at books mentioned here or tutorial. Bioinformatics related books are old but can be useful as this one has a chapter on parsing Genbank data. In addition to this you may look at BioPerl documentation.

For your case I created small script where 16S rRNA sequences are extracted from a bacteria with accession 'NC_013961'. For it to work you will need to install module Bio::DB::GenBank with cpan.

use strict;
use warnings;
use Bio::DB::GenBank;

my $gb = Bio::DB::GenBank->new();

# get data by accession
my $seqio = $gb->get_Stream_by_acc('NC_013961');
my $seq =  $seqio->next_seq;

foreach my $feat ( $seq->get_SeqFeatures() ) {
    # skip tags that are not "rRNA"
    next unless (($feat->primary_tag eq "rRNA") and $feat->has_tag("product"));

    # print sequence only if rRNA product tag value includes '16S'
    my @rRNA = $feat->get_tag_values('product');
    if ( grep( /^16S/, @rRNA ) ) {
      print $feat->seq->seq, "\n";

Login before adding your answer.

Traffic: 1459 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6