Looking for DNA base composition Visualisation tool
1
0
Entering edit mode
9.2 years ago
VenGeno ▴ 100

Hi Folks,

I am looking for a tool that can be used to visualize base compositions of different promoters (example image is given below) for a manuscript. Can you suggest a tool (preferably OSX compatible) ?

< image not found >

Thanks in advance.

Venura

base-composition data-visualization DNA • 2.2k views
ADD COMMENT
5
Entering edit mode
9.2 years ago

As an example, say you start with human GENCODE v21 genes, which you can convert to BED with gtf2bed:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gtf.gz \
    | gunzip -c - \
    | gtf2bed - \
    | grep -w gene - \
    > gencode.v21.genes.bed

You might grab the start position of the forward-stranded genes and then use bedops to pad the upstream range, say by 100 bases (you can adjust padding to match the size of the promoter region you are interested in):

$ awk '$6=="+"' gencode.v21.genes.bed \
    | awk '{$3=$2+1; print $0}' - \
    | bedops --range 100:0 --everything - \
    > gencode.v21.genes.for.upstream100nt.bed

Then repeat this for the reverse strand, using the end position as the transcriptional start site of the gene, padding "downstream" (which is really upstream of the TSS, relative to strand orientation):

$ awk '$6=="-"' gencode.v21.genes.bed \
    | awk '{$2=$3-1; print $0}' - \
    | bedops --range 0:100 --everything - \
    > gencode.v21.genes.rev.upstream100nt.bed

Then union the two files into one BED file:

$ bedops --everything gencode.v21.genes.*.upstream100nt.bed > gencode.v21.genes.upstream100nt.bed

You can use tools like bed2fasta to convert this stranded BED file into a FASTA file, and then BioPython to convert FASTA to a PPM (e.g. Parsing A Fasta File By Columns).

This is just an example - you might use that kind of approach on your set of genes of interest, or any other approach that gets you a FASTA file of your promoters and the resulting base composition.

Once you have your promoters-of-interest and their base composition in percentages, R can do the visualization step easily. RStudio is also a useful IDE for doing work in R.

Consider setting up row vectors of percentage values over five bases (which sum to 100 over all four bases at each columnar position). You could then generalize the following example code to your specific dataset:

a <- c(20, 20, 20, 40, 20)
c <- c(15, 30, 30, 10, 15)
g <- c(40, 30, 25, 15, 45)
t <- c(25, 20, 25, 35, 20)
percs <- data.frame(a, c, g, t)

x_vals <- c(seq(1, nrow(percs)), rev(seq(1, nrow(percs))))
y_a_vals <- c(rep(0, nrow(percs)), rev(percs$a + percs$c + percs$g + percs$t))
y_t_vals <- c(rep(0, nrow(percs)), rev(percs$c + percs$g + percs$t))
y_g_vals <- c(rep(0, nrow(percs)), rev(percs$c + percs$g))
y_c_vals <- c(rep(0, nrow(percs)), rev(percs$c))

plot(x=x_vals, y=y_a_vals, type='n', ylim=c(0,100), xlab="bases", ylab="percs")

polygon(x_vals, y_a_vals, col='darkgreen', border=NA)
polygon(x_vals, y_t_vals, col='red', border=NA)
polygon(x_vals, y_g_vals, col='yellow', border=NA)
polygon(x_vals, y_c_vals, col='darkblue', border=NA)

This would make a plot that takes on the following appearance:

plot image

This should be trivial to markup, once you get a little more familiar with R functions like plot(), text(), etc. Use the ? symbol to get help about functions, e.g. "?plot" or "?text".

You can also export PDF from R using the pdf() function. PDFs can be brought into Adobe Illustrator to polish for publication.

ADD COMMENT
1
Entering edit mode

Hi Alex,

Thank you so much for your very comprehensive answer.!!!

This project I am looking at Arabidopsis promoters (I already extracted promoters [-1000] from TAIR) Link to the File.

So I think I can start from PPM.

My other project I am looking at Human MT2A so I can start from the BED file.

Right now I am following R Programing videos @coursera. I am a slow learner but hopefully will get there soon :)

Cheers,
Venura

ADD REPLY

Login before adding your answer.

Traffic: 2419 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6