Hello all, I've posted the question in Stackoverflow but I thought I might get more responses here. I was able to load my csv file into a numpy array: data = np.genfromtxt('csv_file', dtype=None, delimiter=',') Now I would like to generate a heatmap. I have 19 categories from 11 samples, along these lines:
COG station1 station2 station3 station4 COG0001 0.019393497 0.183122497 0.089911227 0.283250444 0.074110521 COG0002 0.044632051 0.019118032 0.034625785 0.069892277 0.034073709 COG0003 0.033066112 0 0 0 0 COG0004 0.115086472 0.098805295 0.148167492 0.040019101 0.043982814 COG0005 0.064613057 0.03924007 0.105262559 0.076839235 0.031070155 COG0006 0.079920475 0.188586049 0.123607421 0.27101229 0.274806929 COG0007 0.051727492 0.066311584 0.080655401 0.027024185 0.059156417 COG0008 0.126254841 0.108478559 0.139106704 0.056430812 0.099823028
I wanted to use matplotlib colormesh. all the examples I could find used random number arrays. I can get the plot easily with random numbers, however I can't get my csv file to plot. first it refuses to reshape. I have NaNs there so I tried masking but that failed too. Also, I had to delete the header and first column, is there a way to leave them and get labels for the axes? I've edited the original question to include an excerpt of the csv file. any help and insights would be greatly appreciated.
Here's a nickel, kid, go get yourself a better plotting library
> library(ggplot2) > foo = read.table('foo.txt', header=T) > foomelt = melt(foo) Using COG as id variables > ggplot(foomelt, aes(x=COG, y=variable, fill=value)) + geom_tile() + scale_fill_gradient(low='white', high='steelblue') > ggsave('biostar.png') Saving 7.97" x 7.75" image
ggplot2 is plotting heaven and way better than matplotlib. Use rpy2 to run from python - they even have ggplot2 examples in the docs.
To be honest, I took inspiration from this answer on stackoverflow, I just added that you can read the file with genfromtxt:
# notice that your file, if it is as you posted it here, contains some indentation errors.. # I would fix them with sed: $: sed -i 's/^\s+//g' heat.csv # warning: this will modify your file, remove the -i if you want to test it first $: sed -i 's/\s+/\t/g' heat.csv $: ipython -pylab # use names=True if the first row contains column names. >>> data = numpy.genfromtxt("heat.txt", dtype=None, names=True, missing='NaN') >>> data['COG'] array(['COG0001', 'COG0002', 'COG0003', 'COG0004', 'COG0005', 'COG0006', 'COG0007', 'COG0008'], dtype='|S7') >>> heatmap, xedges, yedges = histogram2d(data['station1'], data['station2']) >>> imshow(heatmap, extent=extent)
@ Giovanni : 1. Is it possible to order column names (COG) same as described in the input, Instead of following alphabetical? 2. Is it possible to put the numbers inside heatmap chart ?
Thanx Your code is amzing and simple!!!! Hail ggplot!!