Question: Method For Randomly Selecting Subsequences In An Alignment ?
gravatar for Spyros
7.5 years ago by
Spyros0 wrote:

Hello BioStar Community,

I am working with a database storing motif sequences for proteins. The motifs (subsequences of primary sequences of GPCRs) have been excised from an iterative database scanning algorithm that determines the most conserved subsequences in a multiple sequence alignment, following certain criteria (such as length, whether motifs are allowed to overlap). Because I am going to be doing extensive work with these motifs, I need a way of demonstrating that they are truly non-random. I would like a method that randomly selects motifs (with similar constraints and criteria to the original ones). In this way, given a multiple sequence alignment I could compare the profile of the original motifs to that of the random ones to test whether or not they superimpose each other (they should). Can anyone suggest some way of approaching this?

sequence • 1.4k views
ADD COMMENTlink written 7.5 years ago by Spyros0

What do you mean by randomly selecting motifs. Do you mean randomly selecting subsequences from your genome?

ADD REPLYlink written 7.5 years ago by Istvan Albert ♦♦ 79k

@I Albert: Yes, to prove that the motifs have been selected in a non-random way, I would like to have some way of randomly selecting subsequences from my sequence alignments (proteomic sequences) and repeat this process many times over.

ADD REPLYlink written 7.5 years ago by Spyros0
gravatar for Istvan Albert
7.5 years ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

The simplest approach may be to to write a simple code that loops through your sequences and cuts out substrings at random positions, something like this, needs to adapted to your needs:

from random import randint

# motif size
size = 5

stream = open('f1.fasta')
for id in stream:
    seq =
    lo  = randint(0, len(seq)-size)
    print seq[lo: lo+size]
ADD COMMENTlink written 7.5 years ago by Istvan Albert ♦♦ 79k

@I Albert: Many thanks for that. A loop like that might work if I remove the headings and annotations from the ASCII file of the multiple sequence alignment!

ADD REPLYlink written 7.5 years ago by Spyros0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1303 users visited in the last hour