Question

about bioperl script for extracting sequences from a fasta file(get orf output file) with ids which has less than 30 amino acids

0

Entering edit mode

5.6 years ago

yaminivadapally • 0

i want the script to get sequences from a fasta file which is an output of jemboss(get orf) which has less than 30 amino acids

RNA-Seq bioperl python • 1.5k views

ADD COMMENT • link updated 5.6 years ago by Nitin Narwade ★ 1.6k • written 5.6 years ago by yaminivadapally • 0

1

Entering edit mode

Smell of a homework assignment ! Although, Nitin Narwade has answered your question, you are supposed to write your own code and ask for help in case of errors or problems. You cannot ask for the complete ready made solution. This is not the way this forum is supposed to be used.

For trivial tasks like this , I would recommend using tools like seqkit however, if that is your programming assignment, please make some efforts or show us your efforts here before posting.

ADD REPLY • link 5.5 years ago by lakhujanivijay 5.8k

score 4 · Accepted Answer · 2018-10-03

A plain python code:

fread = open("inputFileName.fasta", "r")
fwrite = open("output.fasta", "w")

header = ""
seq = ""

for line in fread:
    line = line.strip()
    if(line[0] == ">"):     
        if(header != ""):
            if(len(seq) < 30):
                fwrite.write (header + "\n" + seq + "\n")
            header = ""
            seq = ""
        header = line
    else:
        seq += line
if(len(seq) < 30):
    fwrite.write (header + "\n" + seq + "\n")

fread.close()
fwrite.close()