Extract 16S rRNA gene from GenBank files?
1
0
Entering edit mode
24 months ago
MSRS ▴ 580

Hi, I have found a number of posts about 16S gene extract from genank files. But those are not working till now. Looks error (IndentationError: expected an indented block )

Any updated code or programs available?

TIA

NCBI 16S • 766 views
ADD COMMENT
2
Entering edit mode
24 months ago
Mensur Dlakic ★ 27k

The code is correct, but is not properly indented. The problem is with these two lines:

for genome in gbank:
for gene in genome.features:

You can't have a for loop and not indent after that. So all the lines after for genome in gbank: need to be right-indented by 4 spaces:

from Bio import SeqIO, SeqFeature
import sys

gbank=SeqIO.parse(open(sys.argv[1],"rU"),"genbank")

for genome in gbank:
    for gene in genome.features:
        if gene.type=="rRNA":
            if 'product' in gene.qualifiers:
                if '16S' in gene.qualifiers['product'][0]:
                    start = gene.location.nofuzzy_start
                    end = gene.location.nofuzzy_end
                    if 'db_xref' in gene.qualifiers:
                        gi=[]
                        gi=str(gene.qualifiers['db_xref'])
                        gi=gi.split(":")[1]
                        gi=gi.split("'")[0]
                        print(">GeneId|%s|16S rRNA|%s\n%s" % (gi,genome.description,genome.seq[start:end]))
                    else:
                        print(">GeneId|NoGenID|16S rRNA|%s\n%s" % (genome.description,genome.seq[start:end]))
ADD COMMENT
0
Entering edit mode

Thank you so much.

Almost done.

Little error at line 20

print ">GeneId|%s|16S rRNA|%s\n%s" % (gi,genome.description,genome.seq[start:end])
          ^
SyntaxError: invalid syntax
ADD REPLY
1
Entering edit mode

There is no error at my end. Line 20 in the script is the very last line, not the one you listed above. I would make sure that everything is properly pasted.

If you are using python3, there should be parentheses in print lines, like this:

                print(">GeneId|%s|16S rRNA|%s\n%s" % (gi,genome.description,genome.seq[start:end]))
            else:
                print(">GeneId|NoGenID|16S rRNA|%s\n%s" % (genome.description,genome.seq[start:end]))
ADD REPLY
0
Entering edit mode

Okay. Thank you so much for your valuable time. Clear to me.

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6