Question: average distance between genes
0
gravatar for Ric
7 months ago by
Ric290
Australia
Ric290 wrote:

Hi, Is there a way to calculate the average distance between genes and exons from a GFF3 file?

Thank you in advance,

annotation assembly gene • 263 views
ADD COMMENTlink modified 5 months ago by Alex Reynolds29k • written 7 months ago by Ric290
1

sure. make sure gff is position sorted. grep out the gene lines, substract end of previous gene from start of current gene, collect distance, calculate mean/median distance.

ADD REPLYlink written 6 months ago by Carambakaracho2.0k
0
gravatar for Alex Reynolds
5 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Here's a one-liner that uses BEDOPS closest-features on a UCSC-derived refGene list of genes:

$ closest-features --closest --no-ref --no-overlaps --dist refGene.hg38.bed refGene.hg38.bed | cut -d'|' -f2 | grep -v NA | awk '{ if($1<=0){ $1*= -1;} print $1;}' | Rscript -e 'summary(as.numeric(read.table(file("stdin"))[,1]))'
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1    1185    6411   25654   22165 1687452

The median distance between genes is 6411nt. The mean is 25kb, etc.

The file refGene.hg38.bed is sorted with BEDOPS sort-bed.

If you're starting from GFF3, you can use BEDOPS gff2bed:

$ awk '($3 == "gene")' annotations.gff | gff2bed - > annotations.bed

To incorporate into the above one-liner, using bash process substitutions:

$ closest-features --closest --no-ref --no-overlaps --dist <(awk '($3 == "gene")' annotations.gff | gff2bed -) <(awk '($3 == "gene")' annotations.gff | gff2bed -) | cut -d'|' -f2 | grep -v NA | awk '{ if($1<=0){ $1*= -1;} print $1;}' | Rscript -e 'summary(as.numeric(read.table(file("stdin"))[,1]))'
ADD COMMENTlink modified 5 months ago • written 5 months ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1787 users visited in the last hour