Question: Problems With Sequence Parts In Biopython
1
gravatar for elmagodelabahia
6.8 years ago by
elmagodelabahia60 wrote:

Hi there!

I'm starting my work with 454 pyrosequencing amplicons. I am using Biophyton making my own scripts.

I have some questions: how I can search for a determinate sequence within a longer sequence? (e.g. find 'AGCT' within 'AAAGGGagctCCCTTT') I have split bigger sequence into each nucleotide with str(sequence.seq) command, but I am stuck at this point.

The other question is how I can make a counter properly. I have written down one, but instead of getting a proper count, I get a non-sense result. This is a bigger problem because I want to write down 3 files with only one program.

Here is my script:

from Bio import SeqIO

secuencias=SeqIO.parse("/Users/imac/Desktop/Programas_Python_glaciares/documento_prueba.fasta", "fasta")

archivo1=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo1.fasta", "w")
archivo2=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo2.fasta", "w")
archivo3=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo3.fasta", "w")


x=0  
y=0 
z=0
for linea in secuencias:

    dato=linea.description.split(" ")
    seqs=str(linea.seq)

    if "XXXXXXXXXXXXXXXXXX" in seqs:

        archivo1.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")
        x=x+1 
        y=0 
        z=0
    archivo1.write(str(x))


    if "YYYYYYYYYYYYYYYYYY" in seqs:

        archivo2.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")
        x=0
        y=y+1
        z=0
    archivo2.write(str(y))


    if "ZZZZZZZZZZZZZZZZZZ" in seqs:

        archivo3.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")

        x=0
        y=0
        z=z+1
    archivo3.write(str(z))

archivo1.close()
archivo2.close()
archivo3.close()

Sequences in the script are for a test file.

Thanks a lot

python sequence biopython split • 1.7k views
ADD COMMENTlink modified 3.9 years ago by Biostar ♦♦ 20 • written 6.8 years ago by elmagodelabahia60
2

Why are you doing this:

archivo1.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")

Surely this would be simpler (and almost the same - this will preserve the full FASTA description):

SeqIO.write(linea, archivo1, "fasta")

Note that linea is a SeqRecord object.

ADD REPLYlink written 6.8 years ago by Peter5.8k

Can you edit your post to include all code in the code block?

ADD REPLYlink written 6.8 years ago by Niek De Klein2.4k

if you only want to look at the EXACT matching, you can use string find function e.g

 def find_all(src, s):
    c=0; beg=0; slen=len(s)
    while1: 
        start=src.find(s, beg)
        if beg==-1: break
        c+=1; start+=slen
    return c

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by jingtao09110
2
gravatar for Asaf
6.8 years ago by
Asaf5.3k
Israel
Asaf5.3k wrote:

Why do you set the counters to zero on each iteration? e.g.

   x=0
   y=0
   z=z+1

That's the reason your counter don't work. Other than that 'XXX' in seqs should work

ADD COMMENTlink written 6.8 years ago by Asaf5.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1822 users visited in the last hour