about bioperl script for extracting sequences from a fasta file(get orf output file) with ids which has less than 30 amino acids
1
0
Entering edit mode
5.6 years ago

i want the script to get sequences from a fasta file which is an output of jemboss(get orf) which has less than 30 amino acids

RNA-Seq bioperl python • 1.5k views
ADD COMMENT
1
Entering edit mode

Smell of a homework assignment ! Although, Nitin Narwade has answered your question, you are supposed to write your own code and ask for help in case of errors or problems. You cannot ask for the complete ready made solution. This is not the way this forum is supposed to be used.

For trivial tasks like this , I would recommend using tools like seqkit however, if that is your programming assignment, please make some efforts or show us your efforts here before posting.

ADD REPLY
4
Entering edit mode
5.6 years ago
Nitin Narwade ★ 1.6k

A plain python code:

fread = open("inputFileName.fasta", "r")
fwrite = open("output.fasta", "w")

header = ""
seq = ""

for line in fread:
    line = line.strip()
    if(line[0] == ">"):     
        if(header != ""):
            if(len(seq) < 30):
                fwrite.write (header + "\n" + seq + "\n")
            header = ""
            seq = ""
        header = line
    else:
        seq += line
if(len(seq) < 30):
    fwrite.write (header + "\n" + seq + "\n")

fread.close()
fwrite.close()
ADD COMMENT

Login before adding your answer.

Traffic: 1470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6