clustering and assembly
2
1
Entering edit mode
8.3 years ago
mina ▴ 20

Hi everyone

I am confused about the meaning of clustering of transcriptome sequence and assembly of transcriptome sequence.

Based on what I understand, assembly means overlapped reads join to each other in order to form a full or partially sequence of mRNA. Am I right?

Clustering is categorizing the set of homologous gene(expressed mRNA in transcriptome data). Am I right?

Clustering should be done after assembly but why we do clustering for transcriptome data?

It might seems silly, but what is different between clustering and Multiple sequence alignment? Both shows the sequences similarity.

English is not my first language, so please excuse any mistakes.

Thanks in forward.

Regards

clustering Assembly • 3.5k views
1
Entering edit mode
8.3 years ago
BDK_compbio ▴ 130

Aligning more than two sequences is called multiple sequence alignment. Clustering is something where you are grouping the sequences based on similarity. Here is a toy example

A C C T A C _ _
A C T T A C _ _
A _ _ T A C G T
A _ _ A A C G T


Suppose you are aligning these four sequences like this. The first and second are almost same sequence with one substitution. Similarly 3rd and 4th are similar. So 1st and 2nd can be grouped together. Similarly 3rd and 4th can be also grouped together.

1
Entering edit mode
8.3 years ago
Janake ▴ 170

The reason we do clustering after assembly:

After assembling short reads, you will get transcripts. One gene can have more than one transcript, depending on different splicing. As mentioned in the previous answer, clustering help to put all these similar sequences together and help to make a set of transcript for one gene. Also, assembly process can create some transcripts that are not real (for e.g., sequences with more than 95% identical to another sequence in the cluster) and clustering helps identifying them.

0
Entering edit mode

Thank you sbdk82 and Janak

It helps me, but how clustering helps to recognize the fake transcripts that have 95% identity with real one?

0
Entering edit mode

Perhaps, you can take a look at the following program:

http://weizhongli-lab.org/cd-hit/