How can we make this plot? (sequence logo)
8.6 years ago
zx8754 11k

The plot is from this publication, see page 2 the bottom of the plot with ACTG letters in different colours and sizes.

1. Could anyone clarify in layman's terms what it represents?
2. More importantly, how do we plot it using R or any other tools?

Here is the plot, if you can't access the paper:

8.6 years ago

That type of plot is called Sequence Logo. The X axis represent the nucleotide positions of a sequence, and the Y the entropy of each possible nucleotide. The size of each letter indicates how frequently it is found at that position: for example position 8 is almost always an A, while position 5 can be a T or C.

You can create sequence logos from a web interface using a popular application called WebLogo, or using R packages such as SequenceLogo.

Thank you, "sequence logo" was the term I was missing :)

8.6 years ago
Chirag Nepal ★ 2.4k

The plot shows the comparison of the reference human sequence with N-number of samples (patients). The samples turn to have variation of sequences in that region. Now from these samples, you can compare the probability how often a given nucleotide is mismatched/mutated in sapmles. If the letter is big it means the sequence is invariant.

You can use weblogo either in their webpage or in R to make that figure.

Or if you have Chip-seq peaks, you can do motif analysis and make those plots using HOMER tools.

X axis 1 to 10, is it SNP position, why 1 to 10 and not genomic positions?

8.6 years ago
Zhilong Jia ★ 2.2k

Here is motiflogo, which can make a SNP-specific motif logo representation. See figure, please.

8.6 years ago
benaneely ▴ 70

Maybe not as helpful for simply plotting, but in proteomics we often use IceLogo.