Segmentation fault for cd-hit
1
0
Entering edit mode
9.0 years ago
Crystal ▴ 70

Hello All,

I tried to compare two database and remove redundant sequences in one database by using cd-hits.

I do found someone in this forum had the same problem, but I didn't see the resolution on it.

The command I used is

tools/cd-hit-v4.6.1-2012-08-27/cd-hit-est-2d -i V3_VFs.fas -i2 R1.ffn -G 0 -c 1.0 -AS 0 -AL 0 -aL 1.0 -aS 1.0 -o res2_R1_cdhitG

The output is like:

================================================================
                            Output                              
----------------------------------------------------------------
total seq in db1: 5927
total seq in db2: 2457
longest and shortest : 31497 and 78
Total letters: 7764226
Sequences have been sorted
longest and shortest : 16680 and 99
Total letters: 2818134
Approximated minimal memory consumption:
Sequence        : 10M
Buffer          : 1 X 15M = 15M
Table           : 1 X 16M = 16M
Miscellaneous   : 4M
Total           : 47M
Table limit with the given memory limit:
Max number of representatives: 0
Max number of word counting entries: 94035159
Segmentation fault

My colleague used the same code on her Mac to compare another two databases, and the code worked.

So how I solve my problem?

Thanks
Crystal

software-error • 2.3k views
ADD COMMENT
0
Entering edit mode
5 weeks ago
weidonglu • 0

I have tested the command that you supplied. It works well. My PC computer is not a MAC. Maybe you can update your cd-hit-est software and try again.

cd-hit-est-2d -i Galaxy117_transcripts.fasta -i2 Galaxy349_trinity_transcripts.fasta -c 1.0 -AS 0 -AL 0 -aL 1.0 -aS 1.0 -o output_cdhit
================================================================
Program: CD-HIT, V4.8.1 (+OpenMP), Jul 25 2023, 19:20:28
Command: cd-hit-est-2d -i Galaxy117_transcripts.fasta -i2
         Galaxy349_trinity_transcripts.fasta -c 1.0 -AS 0 -AL 0
         -aL 1.0 -aS 1.0 -o output_cdhit

Started: Sun Mar 10 09:02:55 2024
================================================================
                            Output                              
----------------------------------------------------------------
total seq in db1: 98196
total seq in db2: 48332
longest and shortest : 32042 and 140
Total letters: 96639203
Sequences have been sorted
longest and shortest : 29794 and 299
Total letters: 139620363

Approximated minimal memory consumption:
Sequence        : 244M
Buffer          : 1 X 19M = 19M
Table           : 1 X 18M = 18M
Miscellaneous   : 4M
Total           : 287M

Table limit with the given memory limit:
Max number of representatives: 40000
Max number of word counting entries: 64073727

..........    10000  finished
..........        0  compared          0  clusters
........    20000  finished
..........    30000  finished
..........    40000  finished
..........    50000  finished
..........    12906  compared          3  clusters
........    60000  finished
..........    70000  finished
..........    80000  finished
..........    90000  finished
..........    52906  compared         52  clusters
..........    92906  compared         52  clusters

48332 compared  52 clustered
writing non-redundant sequences from db2
writing clustering information
program completed !
ADD COMMENT

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6