Log 2 Ratio For A Tandem Repeat Enrichment

3

Entering edit mode

11.7 years ago

gregphil ▴ 30

Dear Biostar community, I have a statistical bioinformatics question.
I need to compare enrichment for tandem repeats between two sets of sequences.

Example data looks like this:

Tandem Repeat   Set 1 %   Set 2 %
AATGACAT        0.3       0.1
ATATGC          6         0  
...

I want to find out if Set 1 is enriched/depleted for a given tandem repeat compared to a Set 2. My idea was to compare log 2 ratio between these sets. This is my awk solution:

awk '{print log($2/$3)/log(2)}' file
1.58496
inf

Set 1 is enriched for both tandem repeats - however I don't know if this is a right way of solving this problem.

My questions is:

Given two sets of sequences that differ in length - what is the best way to calculate for a tandem repeat enrichment (aka coverage by specific sequence), is it log 2 ratio?

• 2.1k views

ADD COMMENT • link 11.7 years ago by gregphil ▴ 30

Login before adding your answer.

Similar Posts

Loading Similar Posts

Traffic: 2639 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6