How to group some nucleotide sequences based on similarity.
1
0
Entering edit mode
4.3 years ago
arriyaz.nstu ▴ 30

I have 20 nucleotide sequences of a particular gene. I collected these sequences from 20 different strains of a virus. My target is to separate them into different groups based on similarity. Finally, I will find out a consensus sequence from each group. How I can do this???

sequence alignment • 1.8k views
ADD COMMENT
0
Entering edit mode

By doing a multiple sequence alignment. You can use a local program like MEGA/clustal or an online web interface e.g. clustal omega.

ADD REPLY
0
Entering edit mode

As far as I know, through MSA I will get only one consensus sequence for all 20 nucleotide sequences. But, I want to group the sequence first (maybe 3 or 4 groups), the most similar sequence will be put together and one consensus seq for each group. Actually I'm very new to Bioinformatics, maybe I am wrong.

ADD REPLY
1
Entering edit mode

If you know which sequences are more homologous to each other, you could separate them before doing individual MSA's.

If you don't have an idea, then try doing an initial MSA with all sequences (since you said they are from a particular gene they should be reasonably homologous to do that alignment). Examine the results of the alignment (plot a distance tree) and then decide on the groups you want to break the sequences into before doing individual alignments to get a consensus.

ADD REPLY
1
Entering edit mode
4.3 years ago
yairgatt ▴ 10

I am not certain what the question you are hoping to answer is, but your could cluster the sequences using a program like CD-HIT. If you are trying to assess the evolutionary linkage between the strains, it might be best to construct a dendrogram or a phylogenetic tree for these sequences.

ADD COMMENT
0
Entering edit mode

Thank you for your suggestion. My main target is clustering the sequences based on similarity. I think CD-HIT will be a solution.

ADD REPLY
1
Entering edit mode

You have to keep in mind that cd-hit is going to cluster solely based on sequence. It will not take into consideration evolutionary relationships between sequences (or introduce gaps where needed).

ADD REPLY
0
Entering edit mode

Luckily, I only need to group them based on similarity. I am not going to do any evolutionary analysis. Thank you for your help.

ADD REPLY

Login before adding your answer.

Traffic: 2591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6