How Can I Find Position Specific Weight Matrix From A Multi Fasta File
1
0
Entering edit mode
11.4 years ago
k.nirmalraman ★ 1.1k

hello,

How can I find position specific probability matrix from a multi fasta file in R.

I know BioStrings in R as consensusMatrix. But what I would like to do is read all the sequences from a single file and generate a PSPM matrix as output.

Any suggestions? How to achieve this?

• 3.3k views
ADD COMMENT
0
Entering edit mode
11.4 years ago

Given that they are aligned, all you need to do is loop through the strings, and at each position add up the number of each base.

e.g.

12345678
GATAGACC
GTTAGACG
GAAAGACG

-
--| 1 2 3 4 5 6 7 8
--+----------------
A | 0 2 1 3 0 3 0 0
T | 0 1 2 0 0 0 0 0
C | 0 0 0 0 0 0 3 1
G | 3 0 0 0 3 0 0 2

You could also add a laplacian correction (http://cis.poly.edu/~mleung/FRE7851/f07/naiveBayesianClassifier.pdf) by adding a constant to each cell. Then you may normalize each column if you wish.

ADD COMMENT

Login before adding your answer.

Traffic: 2740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6