I am well familiar with microarray-based transcriptomics but don't have much experience with RNA-Seq.
I am interested in published transcriptomics data, found in the GEO database at the NCBI. For microarray projects, one can download the data in various formats that I can work with. However, for RNA-Seq projects, the GEO database offers only the download as "bedgraph" files. I read and understand what these are, but I am not sure how to use them for analyzing transcriptomics data.
I expected some output with gene names and expression values for the different conditions. What I get is a bedgraph format (one track per condition) The GEO data is not human, and the first three columns of the bedgraph file are supposed to contain position information. This is a small section of one of the files:
track type=bedGraph name="TopHat - read coverage" C36799851 0 27 0 C36799851 27 98 1 C36800049 0 0 0 C36800049 0 1 2
I understand that these files are meant to be displayed in the UCSC genome browser. I tried to upload the file, but got an error message about too little memorey (the bedgraph files are huge). So, my first two questions are:
- how am I supposed to find the correct genome browser that maches the bedgraph file? I know the organism, but there might be different versions, releases etc
- should I use the 'upload' function, and what can I do about the memory problem?
My most important question, however, is more fundamental. Even if I manage to display multiple tracks like this in the browser, how can I make sense of these data, e.g. by searching for genes that show big expression changes between two conditions? There must be a solution without using a genome browser - maybe by mapping the positional information in the bedgraph files to the genes.