Question: Extracting Fasta Sequence Using Biopython
3
gravatar for Ma
7.3 years ago by
Ma140
Ma140 wrote:

Hi: I want to extract one section of a chromosome into a FASTA file, I have two versions, but neither of them work correctly. At the end I want to have a normal FASTA file like this:

>chromosome_1_results
ACGGGAAAAA.......

version 1

from Bio import SeqIO

inFile = open('c:\\data\\ch1.fasta','r')
fw=open("c:\\data\\ch1results.fasta",'w')

s=0
for record in SeqIO.parse(inFile,'fasta'):
        fw.write (str(record.seq)[1:((23522552+23660224)/2)+1])

fw.close()

In this version it generates the file, but when I want to open it using for example a word processor it cannot be read. I have tried with ch1.fasta and opens normally. Also I have problems in how to put a header like in the FASTA files to my results.

version 2 from Bio import SeqIO

inFile = open('c:\\data\\ch1.fasta','r')
fw=open("c:\\data\\ch1results.fasta",'w')

s=0
for record in SeqIO.parse(inFile,'fasta'):
        SeqIO.write (str(record.seq)[1:((23522552+23660224)/2)+1],fw,"fasta")

fw.close()

in the second case I got an error that says "str object has no attribute id"

Thanks

fasta biopython • 8.5k views
ADD COMMENTlink written 7.3 years ago by Ma140
4
gravatar for Michael Schubert
7.3 years ago by
Cambridge, UK
Michael Schubert6.9k wrote:

And the answer is: use version 2, but write a record instead of a string.

# read the record first
record.seq = record.seq[1:((23522552+23660224)/2)+1]
SeqIO.write(record,fw,"fasta")
ADD COMMENTlink written 7.3 years ago by Michael Schubert6.9k

thank you very much for your time in answering this question @Michael Schubert, now it works really nice

ADD REPLYlink written 7.3 years ago by Ma140
3
gravatar for Damian Kao
7.3 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

You should read up more about python file IO. Here is how to make it output a header. I am assuming ch1.fasta only has one entry in it?

from Bio import SeqIO

inFile = open('c:\\data\\ch1.fasta','r')
fw=open("c:\\data\\ch1results.fasta",'w')

for record in SeqIO.parse(inFile,'fasta'):
        fw.write(">" + record.id + "\n")
        fw.write (str(record.seq)[1:((23522552+23660224)/2)+1] + "\n")

fw.close()

Are you opening it with notepad?

ADD COMMENTlink written 7.3 years ago by Damian Kao15k
1

The fasta format is just a header beginning with ">" along with an ID name on one line followed by the sequence on the next line(s). As long as you have those two things, it's considered a fasta file. The code I posted should print out a header.

ADD REPLYlink written 7.3 years ago by Damian Kao15k

thanks @DK, you always giving a hand in this field, the ch1.fasta has the complete FASTA sequence of chromosome 1, for that reason I wanted the output, of the region that I need, to be saved in FASTA format. I have tried the solution with fw.write, but the problem is that it only saves a very long line; which is not so good, because I need the file generated to be in FASTA format for other purposes

ADD REPLYlink written 7.3 years ago by Ma140

Why not use SeqIO for writing as well?: SeqIO.write(record, fw, "fasta")

ADD REPLYlink written 7.3 years ago by Michael Schubert6.9k

Yeah SeqIO.write would work too. But I figured it'll be easier to explain the headers by manually typing it out and seeing what it does.

ADD REPLYlink written 7.3 years ago by Damian Kao15k

But it doesn't break lines, i.e. the file is not well human readable.

ADD REPLYlink written 7.3 years ago by Michael Schubert6.9k
1
gravatar for Fabian Bull
7.3 years ago by
Fabian Bull1.3k
German
Fabian Bull1.3k wrote:

For iterating over sequence see: BioPython: SeqIO

For working with sequence records see: Biopython: SeqRecord

ADD COMMENTlink written 7.3 years ago by Fabian Bull1.3k

can you be more specific instead of just pointing to the BioPython tutorial? I cannot find the mistake and I have read that material

ADD REPLYlink written 7.3 years ago by Ma140

peri4n: He explains his problem, shows how he tried to solve it, and where he is stuck. Basic but ok question to me.

ADD REPLYlink written 7.3 years ago by Michael Schubert6.9k

I am just tired of all these "How do I parse file XXX"-question of people who obviously have no clue about programming. I just give them ressources so they can learn it. They don't learn anything if we solve their problems everytime.

ADD REPLYlink written 7.3 years ago by Fabian Bull1.3k

I think this is rather rude answer. People is learning!!!

ADD REPLYlink written 10 months ago by bio9002910
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1169 users visited in the last hour