Extracting Fasta Sequence Using Biopython
3
3
Entering edit mode
12.4 years ago
Ma ▴ 140

Hi: I want to extract one section of a chromosome into a FASTA file, I have two versions, but neither of them work correctly. At the end I want to have a normal FASTA file like this:

>chromosome_1_results
ACGGGAAAAA.......

version 1

from Bio import SeqIO

inFile = open('c:\\data\\ch1.fasta','r')
fw=open("c:\\data\\ch1results.fasta",'w')

s=0
for record in SeqIO.parse(inFile,'fasta'):
        fw.write (str(record.seq)[1:((23522552+23660224)/2)+1])

fw.close()

In this version it generates the file, but when I want to open it using for example a word processor it cannot be read. I have tried with ch1.fasta and opens normally. Also I have problems in how to put a header like in the FASTA files to my results.

version 2 from Bio import SeqIO

inFile = open('c:\\data\\ch1.fasta','r')
fw=open("c:\\data\\ch1results.fasta",'w')

s=0
for record in SeqIO.parse(inFile,'fasta'):
        SeqIO.write (str(record.seq)[1:((23522552+23660224)/2)+1],fw,"fasta")

fw.close()

in the second case I got an error that says "str object has no attribute id"

Thanks

fasta biopython • 16k views
ADD COMMENT
4
Entering edit mode
12.4 years ago

And the answer is: use version 2, but write a record instead of a string.

# read the record first
record.seq = record.seq[1:((23522552+23660224)/2)+1]
SeqIO.write(record,fw,"fasta")
ADD COMMENT
0
Entering edit mode

thank you very much for your time in answering this question @Michael Schubert, now it works really nice

ADD REPLY
3
Entering edit mode
12.4 years ago

You should read up more about python file IO. Here is how to make it output a header. I am assuming ch1.fasta only has one entry in it?

from Bio import SeqIO

inFile = open('c:\\data\\ch1.fasta','r')
fw=open("c:\\data\\ch1results.fasta",'w')

for record in SeqIO.parse(inFile,'fasta'):
        fw.write(">" + record.id + "\n")
        fw.write (str(record.seq)[1:((23522552+23660224)/2)+1] + "\n")

fw.close()

Are you opening it with notepad?

ADD COMMENT
1
Entering edit mode

The fasta format is just a header beginning with ">" along with an ID name on one line followed by the sequence on the next line(s). As long as you have those two things, it's considered a fasta file. The code I posted should print out a header.

ADD REPLY
0
Entering edit mode

thanks @DK, you always giving a hand in this field, the ch1.fasta has the complete FASTA sequence of chromosome 1, for that reason I wanted the output, of the region that I need, to be saved in FASTA format. I have tried the solution with fw.write, but the problem is that it only saves a very long line; which is not so good, because I need the file generated to be in FASTA format for other purposes

ADD REPLY
0
Entering edit mode

Why not use SeqIO for writing as well?: SeqIO.write(record, fw, "fasta")

ADD REPLY
0
Entering edit mode

Yeah SeqIO.write would work too. But I figured it'll be easier to explain the headers by manually typing it out and seeing what it does.

ADD REPLY
0
Entering edit mode

But it doesn't break lines, i.e. the file is not well human readable.

ADD REPLY
1
Entering edit mode
12.4 years ago
Fabian Bull ★ 1.3k

For iterating over sequence see: BioPython: SeqIO

For working with sequence records see: Biopython: SeqRecord

ADD COMMENT
0
Entering edit mode

can you be more specific instead of just pointing to the BioPython tutorial? I cannot find the mistake and I have read that material

ADD REPLY
0
Entering edit mode

peri4n: He explains his problem, shows how he tried to solve it, and where he is stuck. Basic but ok question to me.

ADD REPLY
0
Entering edit mode

I am just tired of all these "How do I parse file XXX"-question of people who obviously have no clue about programming. I just give them ressources so they can learn it. They don't learn anything if we solve their problems everytime.

ADD REPLY
0
Entering edit mode

I think this is rather rude answer. People is learning!!!

ADD REPLY

Login before adding your answer.

Traffic: 2632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6