parsing cd-hit result
I have cluster sequences using cd-hit-est and now I want to filter out the parent or the representative sequence out of the cluster. Any suggestions?
You can use the * symbol at the end of the line to pull out the representative sequence.
I think that this script I wrote, will pull out the representative sequence:
sort-cdhit.pl -i INFILE.fa -o OUTFILE_rep.fa -clstr INFILE.clstr -rep
You will need to make sure you use the option
-d 0 when you run cd-hit to be sure to get the complete identifier in the .clstr output file.
Read the user guide - it mentions a pattern you can use to isolate the representative sequences.
Traffic: 1978 users visited in the last hour