Question: How to count fasta sequences efficiently using (or not ) biopython
0
gravatar for juan.crescente
8 months ago by
juan.crescente20 wrote:

This is not a very memory friendly way of counting sequences from a multi fasta, any ideas to improve this?

generator = SeqIO.parse("test_fasta.fasta","fasta")
sizes = [len(rec) for rec in SeqIO.parse("test_fasta.fasta", "fasta")]

I'm avoiding using tools like grep since I want to make this more portable

biopython python fasta • 450 views
ADD COMMENTlink modified 8 months ago by yhoogstrate50 • written 8 months ago by juan.crescente20

How to cont fasta

Count/concatenate/check length?

ADD REPLYlink modified 8 months ago • written 8 months ago by genomax56k

Count as in the description, also could have guessed from the example. PD title updated

ADD REPLYlink written 8 months ago by juan.crescente20

bioawk? github.com/lh3/bioawk

ADD REPLYlink written 8 months ago by RamRS17k
4
gravatar for a.zielezinski
8 months ago by
a.zielezinski8.4k
a.zielezinski8.4k wrote:

Standard Python will be faster than BioPython:

fh = open("test_fasta.fasta")
n = 0
for line in fh:
    if line.startswith(">"):
        n += 1
fh.close()

or shorter and possibly faster:

num = len([1 for line in open("test_fasta.fasta") if line.startswith(">")])
ADD COMMENTlink modified 8 months ago • written 8 months ago by a.zielezinski8.4k
1
gravatar for tiago211287
8 months ago by
tiago2112871.0k
USA
tiago2112871.0k wrote:

Count the number of sequences in 1 fasta file:

grep -c ">" file.fasta

Count the number of sequences in several fasta files:

find /home/folder/to/file/ -name "*.fasta" | parallel grep -Hc ">" {}

Get the length of each instance of a fasta file:

grep -v ">" 180110.fasta | awk '{print length}'
ADD COMMENTlink modified 8 months ago • written 8 months ago by tiago2112871.0k
0
gravatar for yhoogstrate
8 months ago by
yhoogstrate50
Netherlands
yhoogstrate50 wrote:

You could take a look at this python lib: https://github.com/mdshw5/pyfaidx. It makes use of .fai files or may generate one. It is also compatible with gziped fasta's.

ADD COMMENTlink written 8 months ago by yhoogstrate50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 975 users visited in the last hour