Just complement or reverse sequence fom Biopython, but not reverse-complement one!
2
1
Entering edit mode
7.5 years ago
natasha.sernova ★ 4.0k

Dear all,

I have a problem with Biopython.

I give it a fasta-sequence and need to make either reversed fasta sequence out of it in a separate output file, or just complement fasta-sequence in a separate output file.

The four lines below were taken from Biopython cookbook, and the script works perfectly well.

from Bio import SeqIO

records = (rec.reverse_complement(id="rc_"+rec.id, description = "reverse complement") \

for rec in SeqIO.parse("example.fasta", "fasta") if len(rec)<700)

SeqIO.write(records, "rev_comp.fasta", "fasta")

18

But my goal is fasta-file with just complement to the initial sequence, or reverse sequence to the initial fasta sequence.

So far I have not found any way to get them in Biopython.

I can open final rev_comp.fasta, copy the sequence itself, do something like that:

my_seq[::-1] and print the result to a new fasta-file, it will give me just complement sequence.

But it won't be a pure Biopython application, and a little bit more complicated with reverse sequence only.

Is there any simple way to do it in Biopython only? Actually I don't need reverse complement sequence at all, just either reverse or complement fasta sequences for my initial fasta-sequence.

Thank you very much!

Natasha

Biopython reverse complement • 5.0k views
ADD COMMENT
1
Entering edit mode

Please use code formatting for readability (101010 button), I modified your post for now.

ADD REPLY
1
Entering edit mode

But the code looks weird to me, I don't think this can work.

EDIT: I was wrong :)

ADD REPLY
0
Entering edit mode

This code is not mine,

see http://biopython.org/DIST/docs/tutorial/Tutorial.html

chapter 5.5.3

Honestly I didn't expect it would work, but it did.

ADD REPLY
1
Entering edit mode

Oh right, now I see it, it is a set comprehension split over two lines. Nevermind :-)

ADD REPLY
4
Entering edit mode
7.5 years ago

Here you go:

USAGE: python complement.py -i input_fasta -o output_fasta

from Bio import SeqIO
import getopt, sys



def usage():
    print "Usage: complement.py -i <input_fasta> -o <output_fasta>"


try:
    options, remainder=getopt.getopt(sys.argv[1:], 'i:o:h')

except getopt.GetoptError as err:
    print str(err)
    usage()
    sys.exit()

for opt, arg in options:
    if opt in ('-i'):
        input_file=arg
    if opt in ('-h'):
        usage()
    sys.exit()
    elif opt in ('-o'):
        output_file=arg


out=open(output_file, 'w')

for record in SeqIO.parse(input_file, "fasta"):
    out.write( ">"+record.id+"\n"+str(record.seq.complement())+"\n" )
out.close()

Let me know if it works & don't forget to upvote ;)

ADD COMMENT
0
Entering edit mode

Very nice! Thank you! It does't like line 24 - line 23 is out of place, isn't it? I commented it, and everything worked.

But single reverse case will be just

out.write( ">"+record.id+"\n"+str(record.seq.reverse())+"\n" )

or something more complicated?

ADD REPLY
1
Entering edit mode

Strange, nothing is out of place, luckily it worked for you as commenting line 23 and 24 would affect only if you forget to provide the input/output file names.

Reverse would be :

out.write( ">"+record.id+"\n"+str(record.seq)[::-1]+"\n" )

Warning: untested - just try.

ADD REPLY
0
Entering edit mode

Perfect, it did work! Thank you very much!!!

ADD REPLY
0
Entering edit mode
7.5 years ago

For complement, there is already a function- example from the cookbook

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("AGTACACTGGT", generic_dna)
>>> my_dna
Seq('AGTACACTGGT', DNAAlphabet())
>>> my_dna.complement()
Seq('TCATGTGACCA', DNAAlphabet())

To reverse a string, just use this trick for string reversal

seq[::-1]

where seq is your sequence string. If you have seq object, str(seq)[::-1] should work. See below example

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("AGTACACTGGT", generic_dna)
>>> type(my_dna)
<class 'Bio.Seq.Seq'>
>>> str(my_dna)
'AGTACACTGGT'
>>> str(my_dna)[::-1]
'TGGTCACATGA'
ADD COMMENT
1
Entering edit mode

Thank you, Vijay! Initially I used Bio.Seq as well.

But I have a fasta-file wth header and nucleotide sequence, then I should make something with it by Biopython and get

another fasta-file with a complement sequence, let's forget about a reverse one right now.

ADD REPLY

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6