Question: CD-Hit running problem
0
gravatar for bushrasiraj52
9 months ago by
bushrasiraj520 wrote:

how will we get know about the removal of paralogous sequences by running CD-Hit ?? how can we identify paralogous sequences from output ftext files of list of clusters ??

sequencing • 353 views
ADD COMMENTlink modified 9 months ago by Sej Modha3.8k • written 9 months ago by bushrasiraj520
2
gravatar for Sej Modha
9 months ago by
Sej Modha3.8k
Glasgow, UK
Sej Modha3.8k wrote:

CD-HIT is a sequence clustering tool and it simply clusters the sequences based on applied sequence identity threshold specified using -c. If the paralog sequences fall within the defined threshold then they would be clustered together with the longest sequence chosen as a representative for the cluster.

CD-HIT github page provides a number of scripts to parse the standard clustering output.

ADD COMMENTlink modified 9 months ago • written 9 months ago by Sej Modha3.8k

Alright I had given arguement -c specifying sequence identity threshold. Can you tell me what is the next step to do ?? as my further step is to do blastp against human genome to get non-homologous sequences .. how to correlate with the output of cd-hit to the blastp ??

ADD REPLYlink written 9 months ago by bushrasiraj520

Could you clarify what you are trying to do and if it is unrelated to the cd-hit question posted above then please create a new post explaining the aim?

ADD REPLYlink written 9 months ago by Sej Modha3.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2198 users visited in the last hour