Question

Creating a fasta filter by gene length

0

Entering edit mode

3.5 years ago

adampepper313 • 0

Hi I am working on a project and am wanting to write a command that calculates the length of my gene and outputs those genes with a length shorter than my setting point written in the command line.

I am writing my code within nano and executing it using python within a command line.

This is my code:

from Bio import SeqIO
for seq_record in SeqIO.parse(sys.argv[1], "fasta"):
   if str(len(seq_record)) < (sys.argv[2]):

    print(seq_record.description)

However, I don’t seem to be getting the desired output.

Thanks

gene sequencing python • 968 views

ADD COMMENT • link updated 3.5 years ago by Renesh ★ 2.2k • written 3.5 years ago by adampepper313 • 0

0

Entering edit mode

I think you should use len(str(seq_record)), since you're interested in the length of a string.

ADD REPLY • link 3.5 years ago by Fatima ▴ 1000

score 0 · Answer 1 · 2020-11-15

Simplly use len(seq_record) < sys.argv[2] or len(seq_record.seq) < sys.argv[2]

Alternatively, you can try bioinfokit in Python

from bioinfokit.analys import fasta
fasta_iter = fasta.fasta_reader(file='fasta_file')
for record in fasta_iter:
    header, sequence = record
    # gene length cut-off
    if len(sequence) < desire_gene_length:
         print(header, sequence)

See more here https://reneshbedre.github.io/blog/filereaders.html#fasta-reader