Question: Statistics on BED files
0
abedkurdi1020 wrote:

Hello everybody, I have two BED files. I would like to know which one of them has closer intervals. I already calculated standard deviation for the distances between consecutive intervals. First, I calculated the distances between consecutive intervals using bedtools spacing. since I have different numbers of records in both files, I don't know how to compare standard deviation to make the real comparison.

Does anybody knows how to make the comparison? Is there any other ways?

I appreciate your help, Thank you.

statistics intervals bed • 890 views
modified 2.7 years ago by Alex Reynolds29k • written 2.7 years ago by abedkurdi1020
2
Alex Reynolds29k wrote:

I think one way is to compute and compare the coefficients of variation for your two files, which are unitless or dimensionless.

\$ cut -f1-4 A.bed | sort-bed - > A.bed4
\$ closest-features --closest --dist --no-overlaps --delim '\t' A.bed4 A.bed4 > A.dist.bed5
\$ awk '{a+=\$5; n[NR]=\$5;} END {m=a/NR; s=0; for(i=1;i<=NR;i++) {s+=((n[i]-m)*(n[i]-m));} sd=sqrt(s/(NR-1)); cv=sd/m; print cv;}' A.dist.bed5 > A.cv.txt

Repeat for set B:

\$ cut -f1-4 B.bed | sort-bed - > B.bed4
\$ closest-features --closest --dist --no-overlaps --delim '\t' B.bed4 B.bed4 > B.dist.bed5
\$ awk '{a+=\$5; n[NR]=\$5;} END {m=a/NR; s=0; for(i=1;i<=NR;i++) {s+=((n[i]-m)*(n[i]-m));} sd=sqrt(s/(NR-1)); cv=sd/m; print cv;}' B.dist.bed5 > B.cv.txt

Then compare A.cv.txt and B.cv.txt to get the relative variability.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Alex Reynolds29k

I will give it a try. Thank you.