Creating a fasta filter by gene length
1
0
Entering edit mode
5 months ago

Hi I am working on a project and am wanting to write a command that calculates the length of my gene and outputs those genes with a length shorter than my setting point written in the command line.

I am writing my code within nano and executing it using python within a command line.

This is my code:

from Bio import SeqIO
for seq_record in SeqIO.parse(sys.argv[1], "fasta"):
   if str(len(seq_record)) < (sys.argv[2]):

    print(seq_record.description)

However, I don’t seem to be getting the desired output.

Thanks

gene sequencing python • 219 views
ADD COMMENT
0
Entering edit mode

I think you should use len(str(seq_record)), since you're interested in the length of a string.

ADD REPLY
0
Entering edit mode
5 months ago
Renesh ★ 2.0k

Simplly use len(seq_record) < sys.argv[2] or len(seq_record.seq) < sys.argv[2]

Alternatively, you can try bioinfokit in Python

from bioinfokit.analys import fasta
fasta_iter = fasta.fasta_reader(file='fasta_file')
for record in fasta_iter:
    header, sequence = record
    # gene length cut-off
    if len(sequence) < desire_gene_length:
         print(header, sequence)

See more here https://reneshbedre.github.io/blog/filereaders.html#fasta-reader

ADD COMMENT

Login before adding your answer.

Traffic: 1513 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6