randomly sequence cleavage - average length constraint
3
2
Entering edit mode
7.7 years ago
Nifaste ▴ 20

Dear all,

I would like to randomly cleave sequences of length (Ln) until the average length of the resulting fragments is 50nt (+/- 2nt)

I started it with perl, but I've some problems with the average length constraint ...

I wanted to select a random position in [0 ... sequence length] and calculate the length of created segments. But I think it's not the right way to do.

Any suggestions?

perl next-gen sequence • 1.8k views
1
Entering edit mode
7.7 years ago

One perl solution if there is no overlap:

$Ln=1000; for ($i=int(rand(50))+1;$i<=$Ln;$i=$i+50) {
printf "%d\n",\$i+1-int(rand(3));
}

0
Entering edit mode
7.7 years ago
Ram 37k

1. Allowable length = 48 .. 52
2. Iterate through each sequence. For each sequence,
• pick a random number (call it point) between 0 and len(seq)-1
• if len(seq)-1 - point >=52, pick substring 3' of point, with length randomly picked between 48 and 52
• add to a new list "pool_1" the sequence 5' of point and 3' of point+length picked above (these are the flanking fragments)
3. Repeat above operation on pool_1, this time picking substrings 5' of chosen point and adding the fragments to "pool_2".
4. Repeat until both pool_1 and pool_2 are filled with fragments less than 52 in total length
0
Entering edit mode

I can't put constraint on fragment length. They could be 1 nt or 2nt ...

0
Entering edit mode
7.7 years ago
dylan.storey ▴ 60

If you're concerned with performance, use unpack instead of substring.