In the genbank files rRNA only have following features to it:
rRNA 67149..68657 /locus_tag="Arad_5100" /product="16S ribosomal RNA (operon 1)" /db_xref="GeneID:7371836"
unlike CDS which has,
CDS 74072..74647 /locus_tag="Arad_0081" /codon_start=1 /transl_table=11 /product="hypothetical protein" /protein_id="YP_002542787.1" /db_xref="GI:222084261" /db_xref="GeneID:7371837" /translation="MTARGIARLVELRDAGVTAATMSRMERDGEVLRLARGLYQLSDA PLDANHSLAEAAKRLPKGVVCLVSALAFHGLTDQLPKQVWLAIGRKDWAPKPDSTPIR IVRFTDRLLNESVETHVVEGVPVKVFGIVKTIADCFRYRNKIGLSVAIEGLQEVLRQR KATPGEIARQAERGGVATVIRPYIEALTANG"
so i can just use
to get the amino acid sequences.
But, how do i get the nucleotide sequences for any gene or rRNA genes?
I realize the whole nucleotide is listed at the bottom of the genbank file, and probably location information can be used to extract the sequence. But, i think there should be much simpler way.
Any help would be great.
it would surprise me if biopython didn't have genbank parser. Otherwise, just parse the nuc sequence (e.g. from fasta file) and use substring, to extract the location. Mind the gap, that is substring coordinates are possibly 0 based, and genbank possibly 1 based
yes, maybe its because there are different variants of genbank. The one i am dealing with are downloaded from NCBI ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ . The second option is possible but something possibly should be in the parser that i am obviously missing.