Question: How to count fasta sequences efficiently using (or not ) biopython
0
gravatar for juan.crescente
10 months ago by
juan.crescente20 wrote:

This is not a very memory friendly way of counting sequences from a multi fasta, any ideas to improve this?

generator = SeqIO.parse("test_fasta.fasta","fasta")
sizes = [len(rec) for rec in SeqIO.parse("test_fasta.fasta", "fasta")]

I'm avoiding using tools like grep since I want to make this more portable

biopython python fasta • 732 views
ADD COMMENTlink modified 10 months ago by yhoogstrate50 • written 10 months ago by juan.crescente20

How to cont fasta

Count/concatenate/check length?

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax59k

Count as in the description, also could have guessed from the example. PD title updated

ADD REPLYlink written 10 months ago by juan.crescente20

bioawk? github.com/lh3/bioawk

ADD REPLYlink written 10 months ago by RamRS19k
4
gravatar for a.zielezinski
10 months ago by
a.zielezinski8.5k
a.zielezinski8.5k wrote:

Standard Python will be faster than BioPython:

fh = open("test_fasta.fasta")
n = 0
for line in fh:
    if line.startswith(">"):
        n += 1
fh.close()

or shorter and possibly faster:

num = len([1 for line in open("test_fasta.fasta") if line.startswith(">")])
ADD COMMENTlink modified 10 months ago • written 10 months ago by a.zielezinski8.5k
1
gravatar for tiago211287
10 months ago by
tiago2112871.0k
USA
tiago2112871.0k wrote:

Count the number of sequences in 1 fasta file:

grep -c ">" file.fasta

Count the number of sequences in several fasta files:

find /home/folder/to/file/ -name "*.fasta" | parallel grep -Hc ">" {}

Get the length of each instance of a fasta file:

grep -v ">" 180110.fasta | awk '{print length}'
ADD COMMENTlink modified 10 months ago • written 10 months ago by tiago2112871.0k
0
gravatar for yhoogstrate
10 months ago by
yhoogstrate50
Netherlands
yhoogstrate50 wrote:

You could take a look at this python lib: https://github.com/mdshw5/pyfaidx. It makes use of .fai files or may generate one. It is also compatible with gziped fasta's.

ADD COMMENTlink written 10 months ago by yhoogstrate50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2164 users visited in the last hour