Question: CD-Hit running problem
0
gravatar for bushrasiraj52
4 days ago by
bushrasiraj520 wrote:

how will we get know about the removal of paralogous sequences by running CD-Hit ?? how can we identify paralogous sequences from output ftext files of list of clusters ??

sequencing • 70 views
ADD COMMENTlink modified 4 days ago by Sej Modha2.5k • written 4 days ago by bushrasiraj520
2
gravatar for Sej Modha
4 days ago by
Sej Modha2.5k
Glasgow, UK
Sej Modha2.5k wrote:

CD-HIT is a sequence clustering tool and it simply clusters the sequences based on applied sequence identity threshold specified using -c. If the paralog sequences fall within the defined threshold then they would be clustered together with the longest sequence chosen as a representative for the cluster.

CD-HIT github page provides a number of scripts to parse the standard clustering output.

ADD COMMENTlink modified 4 days ago • written 4 days ago by Sej Modha2.5k

Alright I had given arguement -c specifying sequence identity threshold. Can you tell me what is the next step to do ?? as my further step is to do blastp against human genome to get non-homologous sequences .. how to correlate with the output of cd-hit to the blastp ??

ADD REPLYlink written 2 days ago by bushrasiraj520

Could you clarify what you are trying to do and if it is unrelated to the cd-hit question posted above then please create a new post explaining the aim?

ADD REPLYlink written 2 days ago by Sej Modha2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 520 users visited in the last hour