Question: CD-Hit running problem
0
gravatar for bushrasiraj52
12 months ago by
bushrasiraj520 wrote:

how will we get know about the removal of paralogous sequences by running CD-Hit ?? how can we identify paralogous sequences from output ftext files of list of clusters ??

sequencing • 440 views
ADD COMMENTlink modified 12 months ago by Sej Modha4.0k • written 12 months ago by bushrasiraj520
2
gravatar for Sej Modha
12 months ago by
Sej Modha4.0k
Glasgow, UK
Sej Modha4.0k wrote:

CD-HIT is a sequence clustering tool and it simply clusters the sequences based on applied sequence identity threshold specified using -c. If the paralog sequences fall within the defined threshold then they would be clustered together with the longest sequence chosen as a representative for the cluster.

CD-HIT github page provides a number of scripts to parse the standard clustering output.

ADD COMMENTlink modified 12 months ago • written 12 months ago by Sej Modha4.0k

Alright I had given arguement -c specifying sequence identity threshold. Can you tell me what is the next step to do ?? as my further step is to do blastp against human genome to get non-homologous sequences .. how to correlate with the output of cd-hit to the blastp ??

ADD REPLYlink written 12 months ago by bushrasiraj520

Could you clarify what you are trying to do and if it is unrelated to the cd-hit question posted above then please create a new post explaining the aim?

ADD REPLYlink written 12 months ago by Sej Modha4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 905 users visited in the last hour