Question: parsing cd-hit result
1
gravatar for ####
3.2 years ago by
####190
####190 wrote:

I have cluster sequences using cd-hit-est and now I want to filter out the parent or the representative sequence out of the cluster. Any suggestions?

parsing cd-hit • 2.2k views
ADD COMMENTlink modified 3.2 years ago by Joseph Hughes2.8k • written 3.2 years ago by ####190
1
gravatar for Joseph Hughes
3.2 years ago by
Joseph Hughes2.8k
Scotland, UK
Joseph Hughes2.8k wrote:

You can use the * symbol at the end of the line to pull out the representative sequence. I think that this script I wrote, will pull out the representative sequence: https://github.com/josephhughes/TCRclust/blob/master/sort_cdhit.pl

using:

sort-cdhit.pl -i INFILE.fa -o OUTFILE_rep.fa -clstr INFILE.clstr -rep

You will need to make sure you use the option -d 0 when you run cd-hit to be sure to get the complete identifier in the .clstr output file.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Joseph Hughes2.8k
0
gravatar for RamRS
3.2 years ago by
RamRS25k
Houston, TX
RamRS25k wrote:

Yes. grep.

Read the user guide - it mentions a pattern you can use to isolate the representative sequences.

ADD COMMENTlink written 3.2 years ago by RamRS25k
1

There's also the included clstr2txt script that converts the output into a more parsing friendly format.

ADD REPLYlink written 3.2 years ago by 5heikki8.6k

@Ram ,there is no such pattern mentioned dere...sorry if I am missing it

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by ####190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1524 users visited in the last hour