How can extract nucleotide frequency for position on aligned sequences in clustal omega?
3
0
Entering edit mode
7.1 years ago
algas2006 • 0

hi

I am begginer in Bioinformatics and I have a laptop with OS Ubuntu.

The question is:

How can extract nucleotide frequency for position on aligned sequences in clustal omega?

Thanks

Cèsar

sequencing frecuency of nucleotides clustal omega • 4.3k views
ADD COMMENT
1
Entering edit mode
7.1 years ago

You could use the seqinr package for R. First read in your alignment with the read.alignment() function then use the consensus() function with method="profile" to get a matrix of counts of each nucleotide at each position.

ADD COMMENT
0
Entering edit mode

Hi Jean-Karim Heriche

Thanks for your answer. I had installed R and seqinr, but I got a error message.

The file with the alignement is in: home/Documents/A/B/file1.clustal

c@c-HP-245-G4-Notebook-PC:~$

c@c-HP-245-G4-Notebook-PC:~$cd Documents/

c@c-HP-245-G4-Notebook-PC:~/Documents$

c@c-HP-245-G4-Notebook-PC:~/Documents$ cd A/

c@c-HP-245-G4-Notebook-PC:~/Documents/A$

c@c-HP-245-G4-Notebook-PC:~/Documents/A$ cd B/

c@c-HP-245-G4-Notebook-PC:~/Documents/A/B$

c@c-HP-245-G4-Notebook-PC:~/Documents/A/B$ R

R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)

library("seqinr")

The manual say we have to that write:

clustal.res <- read.alignment(file = system.file("home/Documents/A/B/file1.clustal", package = "seqinr"), format="clustal") Error en read.alignment File is not readable

I tried save the file in fasta format in the site of clustal omega and change the command > fasta.res <- read.alignment but the result is the same.

can you help me to localizate the error?

Thanks

Cesar

ADD REPLY
0
Entering edit mode

Don't use system.file() here. This is used for files that come with packages. Also note that you may be missing the first / in the file path. In any case, make sure the path is correct. You just need

read.alignment( file = "/correct/path/to/file.clustal", format="clustal")

If you still get an error, check also the file permissions.

ADD REPLY
0
Entering edit mode

Jean-Karim Heriche Thanks for your answer!

I ran the correct command and I can see the alignment in the ubuntu terminal.

read.alignment( file = "/home/Documents/A/B/clustalomega1.clustal", format="clustal")

In the final lines I can read this.

$com

[1] NA

attr(,"class")

[1] "alignment"

For use the function “consensus”, I do not know the use of “matali”

I Ran

consensus(alignment, method = "profile")

Error en inherits(matali, "alignment") : objeto 'alignment' no encontrado

Could you help me to localizate the error?

Thanks Cesar

ADD REPLY
0
Entering edit mode

In general, when programming, one assigns the result of a function to a variable so that it can be reused, e.g. in your case:

msa <- read.alignment( file = "/home/Documents/A/B/clustalomega1.clustal", format="clustal")
prof <- consensus(msa, method = "profile")

Although we do occasionally answer such questions here, this forum is not for programming questions. I would encourage you to first learn how to program and then to direct your programming questions to the Stack Overflow forum.

ADD REPLY
0
Entering edit mode

Thanks for your attention!

ADD REPLY
1
Entering edit mode
7.1 years ago
Joe 21k

Another similar option would be with BioPython, to extract a PSSM (Position Specific Score Matrix):

from Bio import SeqIO
from Bio.Align import AlignInfo

align = AlignIO.read('~/path/to/alignment.aln', 'format') # Whatever format your mutliple sequence alignment is in
summary_align = AlignInfo.SummaryInfo(align)
consensus = summary_align.dumb_consensus()
my_pssm = summary_align.pos_specific_score_matrix(consensus,chars_to_ignore = ['N'])
print(my_pssm)

Should end up looking something like:

  G A T C
G 1 1 0 1
T 0 0 3 0
A 1 1 0 0
T 0 0 2 0
C 0 0 0 3

Instructions from: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc301

ADD COMMENT
0
Entering edit mode

Thanks for your answer!

Cesar

ADD REPLY
0
Entering edit mode
6.6 years ago
nhaituan ▴ 10

Following method suggested by Healey,

I really like to output the PSSM ouput with frequency in each cell as you mentioned. However, I follow your method above, the method gives me something like: 0 7.0 0 7.0 0 0 0 7.0 in the cells ect ... within the output. It seems to be threshold 70%. But I really want actual frequencies.

Could anyone share how to output the actual frequencies of alleles within each cell, assuming that the left side is position of an alignment?

Highly appreciate you suggestions. Thank you.

ADD COMMENT
0
Entering edit mode

This would probably be better as a new question so you can demonstrate your code and input data.

The dumb consensus has a threshold of 0.7 by default (I think), but to my knowledge that doesn’t affect the PSSM; in my example code above the consensus sequence is purely used as a ‘marker sequence’ down the left hand side of the PSSM. There is nothing in the documentation, at a glance, about a default threshold but I may well be wrong (and can’t currently check).

ADD REPLY

Login before adding your answer.

Traffic: 4071 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6