Entering edit mode
4.0 years ago
Bioinfo
▴
20
Hello . Please i have problem i wanted to create database for specific bacteria genre , foe that i downloaded all this bacteria genre sequences published on ncbi , and i merge all the fna files in one fasta file , then i used cd-hit to eliminate redundancy , i used the following command .
cd-hit -i Library.fasta -o Library_No_Redundance.fasta -c 0.95 -n 5 -M 0 -T 8
but i ve got this error msg , Fatal Error: in diag_test_aapn, MAX_DIAG reached Program halted !!
please tell me how can solve this error msg Thank you !
Bioinfo : Don't forget to follow up on your past questions/threads.
If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Ahhh , i didn't know that , thank you very much for telling me and yes , just becausei tried to do some analysis i never did before , i feel little bit confused , sorry
my first guess is that you try to align too much against too much and cdhit can't handle that
try with a smaller subset first and see if that technically works and then you can start upscaling. I do think that it does not make much sense to align all bact to all bacteria though
Hello , Thank you for your reply Actually , my aim is to eliminate the duplicated sequences in my library file , so i can determine the coverage of my reads against this library , if i didn't remove the duplication that s will affect the result of coverage
CD-HIT is NOT designed to work on entire genomes.