Problems With Sequence Parts In Biopython
1
1
Entering edit mode
11.9 years ago

Hi there!

I'm starting my work with 454 pyrosequencing amplicons. I am using Biophyton making my own scripts.

I have some questions: how I can search for a determinate sequence within a longer sequence? (e.g. find 'AGCT' within 'AAAGGGagctCCCTTT') I have split bigger sequence into each nucleotide with str(sequence.seq) command, but I am stuck at this point.

The other question is how I can make a counter properly. I have written down one, but instead of getting a proper count, I get a non-sense result. This is a bigger problem because I want to write down 3 files with only one program.

Here is my script:

from Bio import SeqIO

secuencias=SeqIO.parse("/Users/imac/Desktop/Programas_Python_glaciares/documento_prueba.fasta", "fasta")

archivo1=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo1.fasta", "w")
archivo2=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo2.fasta", "w")
archivo3=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo3.fasta", "w")


x=0  
y=0 
z=0
for linea in secuencias:

    dato=linea.description.split(" ")
    seqs=str(linea.seq)

    if "XXXXXXXXXXXXXXXXXX" in seqs:

        archivo1.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")
        x=x+1 
        y=0 
        z=0
    archivo1.write(str(x))


    if "YYYYYYYYYYYYYYYYYY" in seqs:

        archivo2.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")
        x=0
        y=y+1
        z=0
    archivo2.write(str(y))


    if "ZZZZZZZZZZZZZZZZZZ" in seqs:

        archivo3.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")

        x=0
        y=0
        z=z+1
    archivo3.write(str(z))

archivo1.close()
archivo2.close()
archivo3.close()

Sequences in the script are for a test file.

Thanks a lot

python biopython sequence split • 2.7k views
ADD COMMENT
2
Entering edit mode

Why are you doing this:

archivo1.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")

Surely this would be simpler (and almost the same - this will preserve the full FASTA description):

SeqIO.write(linea, archivo1, "fasta")

Note that linea is a SeqRecord object.

ADD REPLY
0
Entering edit mode

Can you edit your post to include all code in the code block?

ADD REPLY
0
Entering edit mode

if you only want to look at the EXACT matching, you can use string find function e.g

 def find_all(src, s):
    c=0; beg=0; slen=len(s)
    while1: 
        start=src.find(s, beg)
        if beg==-1: break
        c+=1; start+=slen
    return c

ADD REPLY
2
Entering edit mode
11.9 years ago
Asaf 10k

Why do you set the counters to zero on each iteration? e.g.

   x=0
   y=0
   z=z+1

That's the reason your counter don't work. Other than that 'XXX' in seqs should work

ADD COMMENT

Login before adding your answer.

Traffic: 2039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6