Question: Looking for DNA base composition Visualisation tool
gravatar for venurah
5.4 years ago by
Sri Lanka
venurah20 wrote:

Hi Folks,

 I am looking for a tool that can be used to visualize  base compositions of different promoters (example image is given below) for a manuscript. Can you suggest a tool (preferably OSX compatible) ?

Thanks in advance.


ADD COMMENTlink modified 5.4 years ago by Alex Reynolds30k • written 5.4 years ago by venurah20
gravatar for Alex Reynolds
5.4 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

As an example, say you start with human GENCODE v21 genes, which you can convert to BED with gtf2bed:

$ wget -qO- \
    | gunzip -c - \
    | gtf2bed - \
    | grep -w gene - \
    > gencode.v21.genes.bed

You might grab the start position of the forward-stranded genes and then use bedops to pad the upstream range, say by 100 bases (you can adjust padding to match the size of the promoter region you are interested in):

$ awk '$6=="+"' gencode.v21.genes.bed \
    | awk '{$3=$2+1; print $0}' - \
    | bedops --range 100:0 --everything - \
    > gencode.v21.genes.for.upstream100nt.bed

Then repeat this for the reverse strand, using the end position as the transcriptional start site of the gene, padding "downstream" (which is really upstream of the TSS, relative to strand orientation):

$ awk '$6=="-"' gencode.v21.genes.bed \
    | awk '{$2=$3-1; print $0}' - \
    | bedops --range 0:100 --everything - \
    > gencode.v21.genes.rev.upstream100nt.bed

Then union the two files into one BED file:

$ bedops --everything gencode.v21.genes.*.upstream100nt.bed > gencode.v21.genes.upstream100nt.bed

You can use tools like bed2fasta to convert this stranded BED file into a FASTA file, and then BioPython to convert FASTA to a PPM (e.g. Parsing A Fasta File By Columns ).

This is just an example - you might use that kind of approach on your set of genes of interest, or any other approach that gets you a FASTA file of your promoters and the resulting base composition.

Once you have your promoters-of-interest and their base composition in percentages, R can do the visualization step easily. RStudio is also a useful IDE for doing work in R.

Consider setting up row vectors of percentage values over five bases (which sum to 100 over all four bases at each columnar position). You could then generalize the following example code to your specific dataset:

a <- c(20, 20, 20, 40, 20)
c <- c(15, 30, 30, 10, 15)
g <- c(40, 30, 25, 15, 45)
t <- c(25, 20, 25, 35, 20)
percs <- data.frame(a, c, g, t)

x_vals <- c(seq(1, nrow(percs)), rev(seq(1, nrow(percs))))
y_a_vals <- c(rep(0, nrow(percs)), rev(percs$a + percs$c + percs$g + percs$t))
y_t_vals <- c(rep(0, nrow(percs)), rev(percs$c + percs$g + percs$t))
y_g_vals <- c(rep(0, nrow(percs)), rev(percs$c + percs$g))
y_c_vals <- c(rep(0, nrow(percs)), rev(percs$c))

plot(x=x_vals, y=y_a_vals, type='n', ylim=c(0,100), xlab="bases", ylab="percs")

polygon(x_vals, y_a_vals, col='darkgreen', border=NA)
polygon(x_vals, y_t_vals, col='red', border=NA)
polygon(x_vals, y_g_vals, col='yellow', border=NA)
polygon(x_vals, y_c_vals, col='darkblue', border=NA)

This would make a plot that takes on the following appearance:

This should be trivial to markup, once you get a little more familiar with R functions like plot(), text(), etc. Use the ? symbol to get help about functions, e.g. "?plot" or "?text".

You can also export PDF from R using the pdf() function. PDFs can be brought into Adobe Illustrator to polish for publication.

ADD COMMENTlink modified 8 months ago by RamRS27k • written 5.4 years ago by Alex Reynolds30k

Hi Alex,

Thank you so much for your very comprehensive answer.!!!

This project I am looking at Arabidopsis promoters (I already extracted promoters [-1000] from TAIR)  Link to the File .

So I think I can start from PPM.

My other project I am looking at Human MT2A so I can start from the BED file.

Right now I am following R Programing videos @coursera. I am a slow learner but hopefully will get there soon :)

Cheers ,


ADD REPLYlink written 5.4 years ago by venurah20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1723 users visited in the last hour