Possible bug in seqkit?
1
1
Entering edit mode
7.8 years ago

Tool used: seqkit

Dummy fasta file (fasta.fa):

>test1
GCATCGATCAGCTACGATCATCACTA
GNNNNNNTACATCAGCACTACATCACTNNNNN
>test2
GTACGCTACGANNNGCTACGACTACGATATATATATATATATATATATATATATATATATATAT
GCTACGATCACNTACATCGACTA
>test3
GTGTGCTACATCATCACTACGTACTACAT
>test4
AA

Command:

./seqkit stat fasta.fa

Output:

file      format  type  num_seqs  sum_len  min_len  avg_len  max_len
fasta.fa  FASTA   DNA          4      176        0       44       87

Problem: min_len =0 (however, minimum length should be 2; sequence id "test4")

Validation using seqkit:

Command:

./seqkit fx2tab -l fasta.fa

Output:

test1   GCATCGATCAGCTACGATCATCACTAGNNNNNNTACATCAGCACTACATCACTNNNNN      58
test2   GTACGCTACGANNNGCTACGACTACGATATATATATATATATATATATATATATATATATATATGCTACGATCACNTACATCGACTA     87
test3   GTGTGCTACATCATCACTACGTACTACAT       29
test4   AA      2

Notice: length of sequence test4 is "2"

Is it a bug or I misunderstood something?

PS: I am loving this tool (all thanks to Wei Shen)and trying to exploit the utilities to make a new tool!

seqkit fasta stats • 2.9k views
ADD COMMENT
2
Entering edit mode

You might have better luck posting a bug report on the github repo.

ADD REPLY
2
Entering edit mode

Oh my dear friend, it's shenwei, or Wei Shen. In Chinese, the last name (Shen) is in front of the first name (Wei), so my social media ID is shenwei*

ADD REPLY
0
Entering edit mode

Oh my dearest friend!, thanks for the information but I just wanted to highlight your username.

Many thanks for your prompt attention!!

PS: I just edited my post :)

ADD REPLY
1
Entering edit mode
7.8 years ago
John 13k

For the data you posted I get the correct result using Version: 0.3.4.1

file  format  type  num_seqs  sum_len  min_len  avg_len  max_len
demo  FASTA   DNA          4      176        2       44       87
ADD COMMENT
2
Entering edit mode

Sorry for that naive bug , it's fixed in the latest version (v0.4.3), please update.

Affected verions: v0.4.0, v0.4.1, v0.4.2

Please use seqkit version to check version, and download from Github page or homepage, do not install or update using conda (latest there: v0.3.4.1) which is not maintained by me.

ADD REPLY
1
Entering edit mode

I'll update the bioconda recipe for seqkit to use the latest version.

ADD REPLY
0
Entering edit mode

Thanks, I'll learn to use boiconda:)

ADD REPLY
3
Entering edit mode

You'd be surprised how many people prefer to install stuff via bioconda. Anyway, the recipe has been updated and the new binaries should be available within the next hour or so (there's a queue on TravisCI at the moment).

ADD REPLY
0
Entering edit mode

Yup, I am using the version v0.4.3. Results are fine for the same data. OP, which version you are using?

ADD REPLY
0
Entering edit mode

I moved to latest version seqkit v0.4.3 !! Thanks venu :)

ADD REPLY

Login before adding your answer.

Traffic: 812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6