Question

List of sequences in a fasta file. I need to generate the reverse complement for each of them.

0

Entering edit mode

5.3 years ago

westin.kosater ▴ 80

Here is what the list of sequences looks like in the file brca.example.illumina.0.1.fasta

>Frag_1 chr17 (Strand + Offset 106709--107175) 467M 101M
AGGCAGGTCTCAAACTCCTGACCTCAGGTGATCCACCCACCTCAAGCCTCCCAAAGTGCT
GGGATTATAGGCATGAGCCACCATGTCCGGCAAGTTTCTTT
>Frag_2 chr17 (Strand + Offset 5449--5912) 464M 101M
ACTCCTGACAAGTGATCCACCTGCCTCGGCCTCCCAAAGTGCTGGGATTACAGACATGAG
CCACCATGCCCAGCCTCCAGCCCATCATTTCTTGATGATTT
>Frag_3 chr17 (Strand + Offset 82312--82850) 539M 101M
TTTGCCATAAAGTGCCTGCCCTCTAGCCTCTACTCTTCCAGTTGCGGCTTATTGCATCAC
AGTAATTGCTGTACGAAGGTCAGAATCGCTACCTATTGTCC
>Frag_4 chr17 (Strand - Offset 1052--1590) 539M 101M
TTGCAGTGAGCCAAGATCATACCACGGCACTCCAGCCTGGGTGACAGTGAGACTGTGGCT
CAACAAAAAAAAAAAAAAAAGGAAAATGAAACTAGAAGAGA

And so on. I would like to generate the reverse complement for each of these, so here is what I have written

for record in SeqIO.parse("brca.example.illumina.0.1.fasta", "fasta"):
    y = record.seq.reverse_complement()

rev_comp2 = open("python_output2.txt", "w")
rev_comp2.write(str(y))

Upon running the script and opening up the text file, it appears to have only generated the reverse complement for the first sequence. Here is the content of the text file

GGCTGGAGTGCAGTGGCATGATCTCGGCTCACTGCAAGCTCTGCCTCCCAGGCTCACACCATTCTCCTGCCTCAGCCTCCCAAGCAGCTGGGACTACAGGC

I thought that including "for record in SeqIO.parse("etc. etc.") would generate the reverse complement for each sequence in the file, but apparently not? What can I write in my script that will generate the reverse complement for each sequence?

biopython python • 1.5k views

ADD COMMENT • link updated 5.3 years ago by JC 13k • written 5.3 years ago by westin.kosater ▴ 80

score 0 · Answer 1 · 2019-01-21

0

Entering edit mode

5.3 years ago

JC 13k

The problem is you are only writing the last record, you need something like:

rev_comp2 = open("python_output2.txt", "w")

for record in SeqIO.parse("brca.example.illumina.0.1.fasta", "fasta"):
    y = record.seq.reverse_complement()
    rev_comp2.write(str(y))

ADD COMMENT • link 5.3 years ago by JC 13k

score 0 · Answer 2 · 2019-01-21

You need to include your write step inside the loop (indented).

You will probably also need to open the file in write+append mode, so that it doesn’t continually overwrite.

E.g.

from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

with open('359663.outfile', 'wa') as handle:
    for rec in SeqIO.parse('359663.fasta', 'fasta'):
        handle.write(SeqRecord(id=rec.id, seq=rec.seq.reverse_complement()).format('fasta'))

The with statement is considered more ‘pythonic’ too, as it means you do not have to close() the file when you’re done with it. You could also use the SeqIO.write method here for added safety, but use of SeqRecord allows you to write in fasta format directly.