parsing cd-hit result
2
1
Entering edit mode
4.8 years ago
#### ▴ 220

I have cluster sequences using cd-hit-est and now I want to filter out the parent or the representative sequence out of the cluster. Any suggestions?

cd-hit parsing • 3.4k views
ADD COMMENT
1
Entering edit mode
4.8 years ago
Joseph Hughes ★ 2.9k

You can use the * symbol at the end of the line to pull out the representative sequence. I think that this script I wrote, will pull out the representative sequence: https://github.com/josephhughes/TCRclust/blob/master/sort_cdhit.pl

using:

sort-cdhit.pl -i INFILE.fa -o OUTFILE_rep.fa -clstr INFILE.clstr -rep

You will need to make sure you use the option -d 0 when you run cd-hit to be sure to get the complete identifier in the .clstr output file.

ADD COMMENT
0
Entering edit mode
4.8 years ago
Ram 34k

Yes. grep.

Read the user guide - it mentions a pattern you can use to isolate the representative sequences.

ADD COMMENT
1
Entering edit mode

There's also the included clstr2txt script that converts the output into a more parsing friendly format.

ADD REPLY
0
Entering edit mode

@Ram ,there is no such pattern mentioned dere...sorry if I am missing it

ADD REPLY

Login before adding your answer.

Traffic: 1978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6