Question: How can extract nucleotide frequency for position on aligned sequences in clustal omega?
0
gravatar for algas2006
16 months ago by
algas20060
algas20060 wrote:

hi

I am begginer in Bioinformatics and I have a laptop with OS Ubuntu.

The question is:

How can extract nucleotide frequency for position on aligned sequences in clustal omega?

Thanks

Cèsar

ADD COMMENTlink modified 10 months ago by nhaituan0 • written 16 months ago by algas20060
1
gravatar for Jean-Karim Heriche
16 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche16k wrote:

You could use the seqinr package for R. First read in your alignment with the read.alignment() function then use the consensus() function with method="profile" to get a matrix of counts of each nucleotide at each position.

ADD COMMENTlink written 16 months ago by Jean-Karim Heriche16k

Hi Jean-Karim Heriche

Thanks for your answer. I had installed R and seqinr, but I got a error message.

The file with the alignement is in: home/Documents/A/B/file1.clustal

c@c-HP-245-G4-Notebook-PC:~$

c@c-HP-245-G4-Notebook-PC:~$cd Documents/

c@c-HP-245-G4-Notebook-PC:~/Documents$

c@c-HP-245-G4-Notebook-PC:~/Documents$ cd A/

c@c-HP-245-G4-Notebook-PC:~/Documents/A$

c@c-HP-245-G4-Notebook-PC:~/Documents/A$ cd B/

c@c-HP-245-G4-Notebook-PC:~/Documents/A/B$

c@c-HP-245-G4-Notebook-PC:~/Documents/A/B$ R

R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)

library("seqinr")

The manual say we have to that write:

clustal.res <- read.alignment(file = system.file("home/Documents/A/B/file1.clustal", package = "seqinr"), format="clustal") Error en read.alignment File is not readable

I tried save the file in fasta format in the site of clustal omega and change the command > fasta.res <- read.alignment but the result is the same.

can you help me to localizate the error?

Thanks

Cesar

ADD REPLYlink modified 16 months ago • written 16 months ago by algas20060

Don't use system.file() here. This is used for files that come with packages. Also note that you may be missing the first / in the file path. In any case, make sure the path is correct. You just need

read.alignment( file = "/correct/path/to/file.clustal", format="clustal")

If you still get an error, check also the file permissions.

ADD REPLYlink written 16 months ago by Jean-Karim Heriche16k

Jean-Karim Heriche Thanks for your answer!

I ran the correct command and I can see the alignment in the ubuntu terminal.

read.alignment( file = "/home/Documents/A/B/clustalomega1.clustal", format="clustal")

In the final lines I can read this.

$com

[1] NA

attr(,"class")

[1] "alignment"

For use the function “consensus”, I do not know the use of “matali”

I Ran

consensus(alignment, method = "profile")

Error en inherits(matali, "alignment") : objeto 'alignment' no encontrado

Could you help me to localizate the error?

Thanks Cesar

ADD REPLYlink written 16 months ago by algas20060

In general, when programming, one assigns the result of a function to a variable so that it can be reused, e.g. in your case:

msa <- read.alignment( file = "/home/Documents/A/B/clustalomega1.clustal", format="clustal")
prof <- consensus(msa, method = "profile")

Although we do occasionally answer such questions here, this forum is not for programming questions. I would encourage you to first learn how to program and then to direct your programming questions to the Stack Overflow forum.

ADD REPLYlink written 16 months ago by Jean-Karim Heriche16k

Thanks for your attention!

ADD REPLYlink written 16 months ago by algas20060
1
gravatar for jrj.healey
16 months ago by
jrj.healey7.7k
United Kingdom
jrj.healey7.7k wrote:

Another similar option would be with BioPython, to extract a PSSM (Position Specific Score Matrix):

from Bio import SeqIO
from Bio.Align import AlignInfo

align = AlignIO.read('~/path/to/alignment.aln', 'format') # Whatever format your mutliple sequence alignment is in
summary_align = AlignInfo.SummaryInfo(align)
consensus = summary_align.dumb_consensus()
my_pssm = summary_align.pos_specific_score_matrix(consensus,chars_to_ignore = ['N'])
print(my_pssm)

Should end up looking something like:

  G A T C
G 1 1 0 1
T 0 0 3 0
A 1 1 0 0
T 0 0 2 0
C 0 0 0 3

Instructions from: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc301

ADD COMMENTlink written 16 months ago by jrj.healey7.7k

Thanks for your answer!

Cesar

ADD REPLYlink written 16 months ago by algas20060
0
gravatar for nhaituan
10 months ago by
nhaituan0
nhaituan0 wrote:

Following method suggested by Healey,

I really like to output the PSSM ouput with frequency in each cell as you mentioned. However, I follow your method above, the method gives me something like: 0 7.0 0 7.0 0 0 0 7.0 in the cells ect ... within the output. It seems to be threshold 70%. But I really want actual frequencies.

Could anyone share how to output the actual frequencies of alleles within each cell, assuming that the left side is position of an alignment?

Highly appreciate you suggestions. Thank you.

ADD COMMENTlink modified 10 months ago • written 10 months ago by nhaituan0

This would probably be better as a new question so you can demonstrate your code and input data.

The dumb consensus has a threshold of 0.7 by default (I think), but to my knowledge that doesn’t affect the PSSM; in my example code above the consensus sequence is purely used as a ‘marker sequence’ down the left hand side of the PSSM. There is nothing in the documentation, at a glance, about a default threshold but I may well be wrong (and can’t currently check).

ADD REPLYlink written 10 months ago by jrj.healey7.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 814 users visited in the last hour