CD-Hit running problem
1
0
Entering edit mode
6.2 years ago

how will we get know about the removal of paralogous sequences by running CD-Hit ?? how can we identify paralogous sequences from output ftext files of list of clusters ??

sequencing • 1.7k views
ADD COMMENT
2
Entering edit mode
6.2 years ago
Sej Modha 5.3k

CD-HIT is a sequence clustering tool and it simply clusters the sequences based on applied sequence identity threshold specified using -c. If the paralog sequences fall within the defined threshold then they would be clustered together with the longest sequence chosen as a representative for the cluster.

CD-HIT github page provides a number of scripts to parse the standard clustering output.

ADD COMMENT
0
Entering edit mode

Alright I had given arguement -c specifying sequence identity threshold. Can you tell me what is the next step to do ?? as my further step is to do blastp against human genome to get non-homologous sequences .. how to correlate with the output of cd-hit to the blastp ??

ADD REPLY
0
Entering edit mode

Could you clarify what you are trying to do and if it is unrelated to the cd-hit question posted above then please create a new post explaining the aim?

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6