Question: Statistics on BED files
0
abedkurdi1030 wrote:

Hello everybody, I have two BED files. I would like to know which one of them has closer intervals. I already calculated standard deviation for the distances between consecutive intervals. First, I calculated the distances between consecutive intervals using bedtools spacing. since I have different numbers of records in both files, I don't know how to compare standard deviation to make the real comparison.

Does anybody knows how to make the comparison? Is there any other ways?

I appreciate your help, Thank you.

statistics intervals bed • 1.1k views
modified 3.4 years ago by Alex Reynolds30k • written 3.4 years ago by abedkurdi1030
2
Alex Reynolds30k wrote:

I think one way is to compute and compare the coefficients of variation for your two files, which are unitless or dimensionless.

``````\$ cut -f1-4 A.bed | sort-bed - > A.bed4
\$ closest-features --closest --dist --no-overlaps --delim '\t' A.bed4 A.bed4 > A.dist.bed5
\$ awk '{a+=\$5; n[NR]=\$5;} END {m=a/NR; s=0; for(i=1;i<=NR;i++) {s+=((n[i]-m)*(n[i]-m));} sd=sqrt(s/(NR-1)); cv=sd/m; print cv;}' A.dist.bed5 > A.cv.txt
``````

Repeat for set B:

``````\$ cut -f1-4 B.bed | sort-bed - > B.bed4
\$ closest-features --closest --dist --no-overlaps --delim '\t' B.bed4 B.bed4 > B.dist.bed5
\$ awk '{a+=\$5; n[NR]=\$5;} END {m=a/NR; s=0; for(i=1;i<=NR;i++) {s+=((n[i]-m)*(n[i]-m));} sd=sqrt(s/(NR-1)); cv=sd/m; print cv;}' B.dist.bed5 > B.cv.txt
``````

Then compare `A.cv.txt` and `B.cv.txt` to get the relative variability.