How to create fasta file
1
0
Entering edit mode
4.4 years ago

Hi,

So I wrote a simple code that doesn't seem to be working well and I don't know why. I have two files:

seq.txt
AAAATTTTTCCCCCGGGG
AAAAAAAAAAAAAAAAAA
TTTTTTTTTTTTTTTTTTTT
....

Ids_to_add.txt
ID_1
ID_2
ID_3
.....

I want a file fasta like this:

>ID_1
AAAATTTTTCCCCCGGGG
>ID_2
AAAAAAAAAAAAAAAAAA
>ID_3
TTTTTTTTTTTTTTTTTTTT
...

My code is like this so far:

g = open("test.txt",'w')

f = open("Ids_to_add.txt", "r")
a = open("seq.txt", "r")

for line in f:
    linef = line.strip()
for line in a:
    linea = line.strip()
    print(">" + linef.upper()+"\n"+linea.upper(), file=g)

Somehow the code output is:

>ID_3
AAAATTTTTCCCCCGGGG
>ID_3
AAAAAAAAAAAAAAAAAA
>ID_3
TTTTTTTTTTTTTTTTTTTT
...

So in conclusion the sequences are fine, but the IDs are always the last ID in the ID file. Any help is welcome!

python • 646 views
ADD COMMENT
1
Entering edit mode

but the IDs are always the last ID in the ID file

Of course it is, your loop that extracts the header lines finishes even before the second one starts, therefore the print command uses the last instance of f which is the last header of ID.

ADD REPLY
2
Entering edit mode
4.4 years ago
RyanRegis ▴ 20

Your first for statement loops through the entire file, so that the last line of file f is the one present in linef when you go into the second for loop that reads the sequences. Try iterating through both files at the same time to pick the lines. Here's an overly simplified example to help:


g = open("test.txt",'w')
f = open("Ids_to_add.txt", "r")
a = open("seq.txt", "r")

for id_line, seq_line in zip(f, a):
    linef = id_line.strip()
    linea = seq_line.strip()
    print(">" + linef.upper()+"\n"+linea.upper(), file=g)

Hope that helps.

ADD COMMENT

Login before adding your answer.

Traffic: 3093 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6