Question: How to change Seqrecord id to numbers from a .txt file (Biopython)
0
gravatar for the_madinator
4.5 years ago by
the_madinator20 wrote:

Hi,

I am new to Biopython and programming in general. I am having difficulty generating viable scripts to convert the id in my pasta files to consecutive numbers. I have a text file containing:

1 2 3 4 etc...

and my code currently reads:

from Bio import SeqIO

lines_file = open("Numbers1_50.txt")
fout = open("output.fasta", "w")
handle = open("input.fasta", "r")

for seq_record in SeqIO.parse(handle, "fasta"):
    seq_record.id = lines_file[0]
    seq_record.description = ""
    print seq_record.id)
    SeqIO.write(fout, lines_file[0]+".fasta","fasta")

lines_file.close()
fout.close()

However, I keep getting an error at the line containing: seq_record.id = lines_file[0]

TypeError: '_io.TextIOWrapper' object is not subscriptable

If anyone has a moment to explain my error, I would greatly appreciate it! Thanks.

sequencing biopython fasta • 2.3k views
ADD COMMENTlink modified 4.5 years ago by Matt Shirley9.4k • written 4.5 years ago by the_madinator20
1
gravatar for Matt Shirley
4.5 years ago by
Matt Shirley9.4k
Cambridge, MA
Matt Shirley9.4k wrote:

You'll need to read the open file into a list before you try to access individual elements:

from Bio import SeqIO

lines_file = open("Numbers1_50.txt").readlines()

However there is probably a better way to do this:

from Bio import SeqIO

with open("input.fasta", "r") as handle, open("output.fasta", "w") as fout:
  for i, seq_record in enumerate(SeqIO.parse(handle, "fasta")):
    seq_record.id = str(i + 1)
    seq_record.description = ""
    SeqIO.write(seq_record, fout, "fasta")
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Matt Shirley9.4k

Thank you for your help! I tried enumerate the id's as you suggested however I received the following error:

return text.replace("\n", " ").replace("\r", " ").replace(" ", " ") AttributeError: 'int' object has no attribute 'replace'

I am not sure what happened here.

ADD REPLYlink written 4.5 years ago by the_madinator20
1

Edited. I should have coerced the integer to a string.

ADD REPLYlink written 4.5 years ago by Matt Shirley9.4k
0
gravatar for jackfrost2199
4.5 years ago by
Washington D.C.
jackfrost219970 wrote:

Try this:

from Bio import SeqIO


fout = open("output.fasta", "w")
handle = open("input.fasta", "r")
new_id=0

for seq_record in SeqIO.parse(handle, "fasta"):
    seq_record.id = new_id
    new_id =+ 1
    seq_record.description = ""
    print seq_record.id)
    SeqIO.write(fout, lines_file[0]+".fasta","fasta")

fout.close()

It looks like you're currently trying to read the numbers from a file. When you take your file (in your case lines_file) and ask it for subscript 0 (i.e. [0]) you're really asking for the 0th character, not the first row. Python doesn't handle this (at least in a way I'm familiar with but then again I'm a C++ guy primarily).

The code above solves the problem by starting a variable at 0 and incrementing by 1 for each sequence record and then assigning it to the seq_record.id within the loop you already constructed. This doesn't solve the case if you wanted to extract out IDs from a file and transfer them in, which might be what you're really after.

If you're trying to do that, then you'll need to open the file (like you're currently doing) and instead of trying to access [0] in the file, you probably want to use the readline() method. You can try code like this:

seq_record.id=lines_file.readline()

Alternatively you can use:

seq_record.id=lines_file.readline().strip()

which will also remove the newline (I'm not sure if you want it to or not in this case). You would substitute one of these two lines for the line where you're currently saying:

seq_record.id = lines_file[0]

in your code.

I believe that should solve your issue, but I'm happy to help further if it doesn't quite get you there.

ADD COMMENTlink written 4.5 years ago by jackfrost219970

Thank you! I used seq_record.id=lines_file.readline() and it worked! My final code read:

for seq_record in SeqIO.parse(handle, "fasta"):
    seq_record.id = lines_file.readline()
    seq_record.description = ""
    SeqIO.write(seq_record, fout, "fasta")
    print(seq_record)
ADD REPLYlink written 4.5 years ago by the_madinator20
1

Great! I'm glad it worked for you especially since I don't use python much :D

ADD REPLYlink written 4.5 years ago by jackfrost219970
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1493 users visited in the last hour