Question: Making a box/density plot for G/C content at millions of genomic locations?
gravatar for mmmmcandrew
3.8 years ago by
mmmmcandrew100 wrote:

Hi all-

I have 4 populations containing millions of bed intervals, each containing a different average G/C content. I would like to make something similar to a box or density plot. On the x-axis, I would simply have 4 categories (my four populations). On the y-axis, I would have GC content. For any given category/populations, I would like to show the G/C content for each of those millions of intervals within as separate points, or as a density cloud, as well as some kind of marker showing the average G/C content (normalized to base pair content). Can anyone recommend a simple program that I could use to accomplish this? I would prefer not to use R if possible, as I'm very clumsy with it.

gc content boxplot • 983 views
ADD COMMENTlink modified 3.8 years ago by Alex Reynolds31k • written 3.8 years ago by mmmmcandrew100
gravatar for Alex Reynolds
3.8 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

Maybe there is one program that will do all of this, but the approach below should work, I'd think, and it might be a good way to learn a few useful skills:

  1. Convert each of your four BED files to four FASTA files (e.g. with bed2faidx script or similar that queries samtools faidx-indexed FASTA files)
  2. Run each FASTA file against a GC content script (e.g., such as with the second awk script here) which gives you a content value for each sequence. You could pipe this to awk again to print the population name in one column, and the fractional GC content value in the second.
  3. Use cat to merge all the population GC values into one file. Import this file into R. Make the population name column into factors so you can use their names as a variable (or "category"). Use the ggplot2 library to make a box or violin plot against the population variable, perhaps labeling with median and first and third IQR values.
ADD COMMENTlink written 3.8 years ago by Alex Reynolds31k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1424 users visited in the last hour