How to handle gaps in consensus sequence construction from multiple sequence alignment
2
1
Entering edit mode
6.3 years ago
galen.seilis ▴ 10

I am writing a function which takes aligned sequences, and outputs a consensus sequence using IUPAC ambiguous nucleotides. I am unsure how to handle assignment at a given position if gaps are the mode character. Here is an example.

Sequence 0    G-ATGT

Sequence 1    G-ATGT

Sequence 2    G-ATGT

Sequence 3    GCATGT

 

What would the appropriate consensus sequence in this case be?

consensus sequence gaps MSA • 2.3k views
ADD COMMENT
1
Entering edit mode
6.3 years ago

Look at how sequence logos are built. Some advice can be found here.

ADD COMMENT
1
Entering edit mode
6.3 years ago

I think the two best options for this are:

PROSITE pattern notation may be better known by biologists, whereas regular expressions are more useful for bioinformaticians in my opinion.

ADD COMMENT

Login before adding your answer.

Traffic: 2070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6