Log 2 Ratio For A Tandem Repeat Enrichment
0
3
Entering edit mode
10.5 years ago
gregphil ▴ 30

Dear Biostar community, I have a statistical bioinformatics question.
I need to compare enrichment for tandem repeats between two sets of sequences.

Example data looks like this:

Tandem Repeat   Set 1 %   Set 2 %
AATGACAT        0.3       0.1
ATATGC          6         0  
...

I want to find out if Set 1 is enriched/depleted for a given tandem repeat compared to a Set 2. My idea was to compare log 2 ratio between these sets. This is my awk solution:

awk '{print log($2/$3)/log(2)}' file
1.58496
inf

Set 1 is enriched for both tandem repeats - however I don't know if this is a right way of solving this problem.

My questions is:

Given two sets of sequences that differ in length - what is the best way to calculate for a tandem repeat enrichment (aka coverage by specific sequence), is it log 2 ratio?

• 2.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 1998 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6