How To Generate Multi-Nucleotide Occupancy Counts For Each Coordinate Of My Reads?
1
5
Entering edit mode
14.6 years ago
Biostar User ★ 1.0k

I need to generate nucleotide occupancy counts for each position of a given sequence then summed over each of the input sequences. An example desired output (for di-nucleotide AT):

dinucleotide occupancy

python nucleotide-frequency • 2.8k views
ADD COMMENT
7
Entering edit mode
14.6 years ago

The code snippet below will populate the store dictionary keyed by the nucleotide patterns and values as lists that contain the occupancy for each index. (Updated answer now includes arbitrary length nucleotide counts)::

from itertools import count

def pattern_update(sequence, width=2, store={}):
    """
    Accumulates nucleotide patterns of a certain width with 
    position counts at each index.
    """

    # open intervals need a padding at end for proper slicing
    size  = len(sequence) + 1

    def zeroes():
        "Generates an empty array that holds the positions"
        return [ 0 ] * (size - width)

    # these are the end indices
    ends = range(width, size)

    for lo, hi in zip(count(), ends):
        # upon encountering a missing key initialize 
        # that value for that key to the return value of the empty() function
        key = sequence[lo:hi]
        store.setdefault(key, zeroes())[lo] += 1

    return store

The code at multipatt.py demonstrates its use in a full program. Set the size to the maximal possible sequence size. A typical use case::

store = {}
seq1 = 'ATGCT'
pattern_update(seq1, width=2, store=store)    

seq2 = 'ATCGC'
pattern_update(seq2, width=2, store=store)    

print store

will print::

{'CG': [0, 0, 1, 0], 'GC': [0, 0, 1, 1], 'AT': [2, 0, 0, 0], 
'TG': [0, 1, 0, 0], 'TC': [0, 1, 0, 0], 'CT': [0, 0, 0, 1]}
ADD COMMENT

Login before adding your answer.

Traffic: 2330 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6