Retrieving 16s rRNA from Whole genome sequence using PERL
1
0
Entering edit mode
3.8 years ago

I have a list of 100 Accession numbers of bacterial genomes. I want to retrieve the 16s rRNA sequence alone from their genbank record, in field "rRNA", product '16s ribosomal RNA' using PERL or MATLAB. Can you guide me to a book or tutorial for it? Thank you!

PERL 16s rRNA Genome GenBank • 1.0k views
ADD COMMENT
2
Entering edit mode
3.8 years ago
zubenel ▴ 120

For learning Perl in general you may look at books mentioned here or tutorial. Bioinformatics related books are old but can be useful as this one has a chapter on parsing Genbank data. In addition to this you may look at BioPerl documentation.

For your case I created small script where 16S rRNA sequences are extracted from a bacteria with accession 'NC_013961'. For it to work you will need to install module Bio::DB::GenBank with cpan.

use strict;
use warnings;
use Bio::DB::GenBank;

my $gb = Bio::DB::GenBank->new();

# get data by accession
my $seqio = $gb->get_Stream_by_acc('NC_013961');
my $seq =  $seqio->next_seq;

foreach my $feat ( $seq->get_SeqFeatures() ) {
    # skip tags that are not "rRNA"
    next unless (($feat->primary_tag eq "rRNA") and $feat->has_tag("product"));

    # print sequence only if rRNA product tag value includes '16S'
    my @rRNA = $feat->get_tag_values('product');
    if ( grep( /^16S/, @rRNA ) ) {
      print $feat->seq->seq, "\n";
    }
}
ADD COMMENT

Login before adding your answer.

Traffic: 1999 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6