Question

Visualising individual humans in RNA-seq

0

Entering edit mode

4.4 years ago

Lisa ▴ 330

I have a question that I think is probably simple but I just am struggling to figure out what is good practice. I have recently switched to working with Human Rna-seq data and the issue of visualising what is happening in individual patients keeps coming up. Before I used to work with microbial informatics, so this was never an issue I had to face.

I do an all versus all comparison between two conditions, in this case, disease positive(pos) and disease negative (neg). I have a total N of 43, 23 in pos and 20 in neg. I can then see what transcriptional changes are between the overall conditions. However, people have expressed interest in what is happening in individual patients. They keep showing me papers where this is possible. From Reading the supplementary files, I see that in many cases the analysis reverts to the counts' file and uses that to visualise the differences in individuals in a heatmap. Many others are in Microarrays where this isn’t such a problem

Is this good practise, to generate your genes of interest in an all vs all comparison and visualise it in the individual counts' files? Or am I missing a trick?

RNA-Seq • 891 views

ADD COMMENT • link updated 4.0 years ago by Biostar 20 • written 4.4 years ago by Lisa ▴ 330

score 1 · Answer 1 · 2019-11-25

This is certainly a tricky problem. My favourite method for exploring what's happening with individual samples is GSVA. GSVA takes your expression matrix (genes x samples) and a set of gene sets (pathways, functions, whatever you like), and calculates a score that summarises the geneset for a given sample. You end up with a matrix genesets x samples.

This approach has benefits over gene x sample heatmaps by associating functional relevance to your genesets, and possibly offering more insightful views into sample population heterogeneity.

Resources

Use Case I - Interested in a few Pathways and Genesets

Sometimes there's a critical pathway(s) that investigators are interested that yielded the generation of this experiment.

Make sure you've got your expression matrix (I'd recommend Salmon quantifications from a Gencode reference, imported using tximport, and processed using the Limma-Voom workflow)
Grab a your genesets of interest and format them as a named list object where the elements are a vector containing the same identifier schema as your expression matrix. So on the example above make sure they're using Ensembl IDs. Subset this for the genesets you're interested in.
Run the GSVA function with the above inputs
Make this into a heat map using your favourite library (ggplot2 for example) and tack on some phenotypic annotation to interpret

Use Case II - I've no idea what I'm interested in

Often you'd like to see what functional elements are statistically relevant against a large reference set

Make sure you've got your expression matrix (see above)
Prepare your geneset list object (see above, just keep all genesets)
Run the GSVA function with the above inputs
Fit a model to the GSVA matrix (see example below)
Grab statistically relevant genesets, and subset GSVA matrix
Make heat map

Coded Example

Taken from the GSVA function manual

library(limma)

p <- 10 ## number of genes
n <- 30 ## number of samples
nGrp1 <- 15 ## number of samples in group 1
nGrp2 <- n - nGrp1 ## number of samples in group 2
## consider three disjoint gene sets
geneSets <- list(set1=paste("g", 1:3, sep=""),
                 set2=paste("g", 4:6, sep=""),
                 set3=paste("g", 7:10, sep=""))
## sample data from a normal distribution with mean 0 and st.dev. 1
y <- matrix(rnorm(n*p), nrow=p, ncol=n,
            dimnames=list(paste("g", 1:p, sep="") , paste("s", 1:n, sep="")))
## genes in set1 are expressed at higher levels in the last 10 samples
y[geneSets$set1, (nGrp1+1):n] <- y[geneSets$set1, (nGrp1+1):n] + 2
## build design matrix
design <- cbind(sampleGroup1=1, sampleGroup2vs1=c(rep(0, nGrp1), rep(1, nGrp2)))
## fit linear model
fit <- lmFit(y, design)
## estimate moderated t-statistics
fit <- eBayes(fit)
## genes in set1 are differentially expressed
topTable(fit, coef="sampleGroup2vs1")
## estimate GSVA enrichment scores for the three sets
gsva_es <- gsva(y, geneSets, mx.diff=1)$es.obs
## fit the same linear model now to the GSVA enrichment scores
fit <- lmFit(gsva_es, design)
## estimate moderated t-statistics
fit <- eBayes(fit)
## set1 is differentially expressed
topTable(fit, coef="sampleGroup2vs1")

score 0 · Answer 2 · 2019-11-25

Heatmaps are often a good idea globally. Have a look at the raw counts and the normalized values output, as both contain separate information. If you use a tool like Degust http://degust.erc.monash.edu/ you can get an interactive heatmap "for free" if you upload a counts file.

For genes of interest (but not directly for your patient question), I also like the approach taken in limma/voom. At least in the Galaxy version of this we host, it generates Stripcharts which show the distribution of up and down regulated genes.