Question: clustering and assembly
1
gravatar for mina
5.0 years ago by
mina20
Malaysia
mina20 wrote:

Hi everyone

I am confused about the meaning of clustering of transcriptome sequence and assembly of transcriptome sequence.

Based on what I understand, assembly means overlapped reads join to each other in order to form  a full or partially sequence of mRNA.Am I right?

clustering is categorizing the set of homologous gene(expressed mRNA in transcriptome data).Am I right?

clustering should be done after assembly but why we do clustering for transcriptome data?

it might seems silly , but what is different between clustering and Multiple sequence alignment ? both shows the sequences similarity.

English is not my first language, so please excuse any mistakes.

Thanks in forward.

Regards

clustering assembly • 2.4k views
ADD COMMENTlink modified 5.0 years ago by Janake160 • written 5.0 years ago by mina20
1
gravatar for sbdk82
5.0 years ago by
sbdk8260
United States
sbdk8260 wrote:

Aligning more than two sequences is called multiple sequence alignment. Clustering is something where you are grouping the sequences based on similarity. Here is a toy example

A C C T A C _ _

A C T T A C _ _

A _ _ T A C G T

A _ _ A A C G T

Suppose you are aligning these four sequences like this. The first and second are almost same sequence with one substitution. Similarly 3rd and 4th are similar. So 1st and 2nd can be grouped together. Similarly 3rd and 4th can be also grouped together.

 

ADD COMMENTlink written 5.0 years ago by sbdk8260
1
gravatar for Janake
5.0 years ago by
Janake160
United States
Janake160 wrote:

The reason we do clustering after assembly:

After assembling short reads, you will get transcripts. One gene can have more than one transcript, depending on different splicing. As mentioned in the previous answer, clustering help to put all these similar sequences together and help to make a set of transcript for one gene. Also, assembly process can create some transcripts that are not real (for e.g., sequences with more than 95% identical to another sequence in the cluster) and clustering helps identifying  them. 

ADD COMMENTlink written 5.0 years ago by Janake160

Thank you sbdk82 and Janak

It helps me , But how clustering helps to recognize the fake transcripts that have  95% identity with real one?

ADD REPLYlink written 5.0 years ago by mina20

Perhaps, you can take a look at the following program:

http://weizhongli-lab.org/cd-hit/

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Janake160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2305 users visited in the last hour