Question: Calculating Intron relative distribution
gravatar for florent.risacher
4.8 years ago by
florent.risacher20 wrote:


I would like to be able to calculate the relative distribution of introns in genes.

A paper from Jeffares et al, 2006 "The biology of intron gain and loss" shows graph of the proportion of introns and their distribution to show the 5' bias in Eukaryote.

It is explained that : relative position of the intron = bp upstream of the intron in coding sequence / total number of bp in coding sequence. Then, they pool it into ten classes.


I would like to do something similar for at least the human genome. I am interested in doing this for each chromosome to be able to compare the distribution.

I found that the UCSC has a pretty good database with exon position and I was wondering if it was possible to use this data. I know it would probably require to write a program but I am not a programmer  and the only code I know is R.

Any help is appreciated,

Thank you



density distribution intron ucsc • 1.5k views
ADD COMMENTlink modified 4.8 years ago by Martin Morgan1.6k • written 4.8 years ago by florent.risacher20
gravatar for Martin Morgan
4.8 years ago by
Martin Morgan1.6k
United States
Martin Morgan1.6k wrote:

For a model organism, you could load the TxDb annotation package (see, e.g., TxDb.Hsapiens.UCSC.hg19.knownGene), extract the introns and transcripts from the data, and then calculate the relative offset (of the intron start from the transcript start on the plus strand, for instance)

introns <- unlist(intronsByTranscript(
    TxDb.Hsapiens.UCSC.hg19.knownGene, use.names=TRUE))
tx <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene,
                  columns=c("tx_name", "TXCHROM"))

idx <- match(names(introns), tx$tx_name)
offset <- ifelse(strand(introns) == "+",
                 start(introns) - start(tx)[idx],
                 end(tx)[idx] - end(introns)) / width(tx)[idx]

The commands are from the GenomicFeatures and GenomicRanges packages. Plot the result


For less model organisms, a 'TxDb' database can be created with the vignette on the GenomicFeatures landing page.

You could coordinate the offset information with chromosome (with a little bit of trickery to deal with the appropriate factor 'levels' and to focus only on the autosome and sex chromosomes) with

df <- data.frame(offset=offset, chrom=tx$TXCHROM[idx])
lvls <- paste0("chr", c(1:22, "X", "Y"))
df <- df[df$chrom %in% lvls,]

After subsetting factors, it is convenient to drop the unused levels

df$chrom <- factor(df$chrom, lvls)

Visualize the information with, e.g., 

densityplot(~ offset | chrom, df, plot.points=FALSE)
densityplot(~ offset, group=chrom, df,   

To aggregate several levels, one possibility is along the lines of

levels(df$chrom) <- c(rep("auto", 22), rep("sex", 2))
densityplot(~offset, group=chrom, df, plot.points=FALSE)
ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Martin Morgan1.6k

Wow it works pretty well !

I still don't get everything, especially how works the database but I'm looking into it ! Thanks !


However I still can't figure out how to regroup all the autosomes together. I understand I have to deal with the "lvls" but I stuck.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by florent.risacher20

I revised the final part of the question to separately drop unused factor levels after subsetting, and then to illustrate one way of changing the labels on the new levels

ADD REPLYlink written 4.8 years ago by Martin Morgan1.6k

Fantastic !

It's exactly what I was looking for ! I'm already starting to look for the other database I can use this with.

Thank you again !

ADD REPLYlink written 4.8 years ago by florent.risacher20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1821 users visited in the last hour