Question

Making PCA plot using variance instead of counts on Sleuth (plot_pca)

0

Entering edit mode

6 months ago

unkown • 0

Hello all,

I am in the process of moving from Deseq2 to Sleuth for all my bulk RNAseq analysis. The biggest question that I have is how do I plot a PCA plot using variance instead of counts with Sleuth results?

I started by using the plot_pca function. This one however, shows the read counts, I also am not sure how to read this data.

Method 1: `plot_pca` + sleuth

so = sleuth_fit(so, ~sampletype, fit_name = "full")
so = sleuth_fit(so, ~1, fit_name = "reduced")
so = sleuth_lrt(so, null_model = "reduced", alt_model = "full")
res = sleuth_results(so, test = "reduced:full", test_type = "lrt", show_all = TRUE)
plot_pca(so, color_by = "sampletype", text_labels = TRUE, units = "scaled_reads_per_base") +
  geom_point(size=14, pch=0.5)+
  theme_bw()+ theme(axis.title.x = element_text(face = "bold", size=20),
  axis.title.y = element_text(face = "bold", size=20),
  axis.text.x = element_text(face="bold", color="#000000", size=20),
  axis.text.y = element_text(face="bold", color="#000000", size=20),
  legend.title=element_text(face="bold", size=5),
  strip.text.x = element_text(size = 18),
  strip.text = element_text(size=10),
  strip.placement = "outside")

plot_pca results with read counts along the axis

The other alternative is to extract the read count matrix and plot it using prcomp and ggplot2.

Method 2: `prcomp` + `ggplot`

norm_counts <- sleuth_to_matrix(so, "obs_norm", "scaled_reads_per_base")
log_norm_counts <- so$transform_fun_counts(norm_counts)
pc <- prcomp(t(log_norm_counts))
plot2_pca <- data.frame(pc$x, s2c)
ggplot(plot2_pca, aes(PC1, PC2)) +
  geom_point(aes(color=sampletype),size=14, pch=0.5) +
  xlab('PC1') +  ylab('PC2') +
  geom_text_repel(aes(label=sample)) +
  theme_bw() + 
  theme(axis.title.x = element_text(face = "bold", size=20),
    axis.title.y = element_text(face = "bold", size=20),
    axis.text.x = element_text(face="bold", color="#000000", size=20),
    axis.text.y = element_text(face="bold", color="#000000", size=20),
    legend.title=element_text(face="bold", size=5),
    strip.text.x = element_text(size = 18),
    strip.text = element_text(size=10), strip.placement = "outside")

prcomp + ggplot 2 results

Questions:

What am I doing wrong with method 2? Why do my plots look so different, especially, the PGB1 samples? In method 1, the two PGB1 samples are close together, while in method 2 they show a great deal of separation?
Is there a way to plot the variance using plot_pca? I haven't come across any during all my searches.

Thank you!

Sleuth RNAseq PCA • 843 views

ADD COMMENT • link updated 6 months ago by Ram 44k • written 6 months ago by unkown • 0

0

Entering edit mode

What am I doing wrong with method 2?

Nothing, this is how PCA is commonly done. The sleuth axis limits look extremely odd. Any reason to use sleuth? It did not convince in the benchmark papers I recall, plus it is mainly for transcript-level DE as it can use the inferential replicate information from kallisto.

ADD REPLY • link updated 6 months ago by Ram 44k • written 6 months ago by ATpoint 84k

0

Entering edit mode

Hello,

Thank you for your reply. I used salmon with bootstraps and its advised to use Sleuth if I want to make use of my bootstrap information. I typically use Salmon counts + Deseq2, but wanted to switch over because sleuth is recommended.

Would you suggest I stick to Salmon + Deseq2, rather than Sleuth?

Could you also link the benchmark papers you were talking about? Most of the papers that I came across say that Sleuth is just as good as Deseq2 in most cases, with the exception of low abundance genes.

ADD REPLY • link updated 6 months ago by Ram 44k • written 6 months ago by unkown • 0

0

Entering edit mode

For gene level differential expression I would indeed stick to the most established tools, be it DESeq2, edgeR or limma. They all do not use inferential information. It is not common to use it on gene level, since the main mapping ambiguity is between transcripts of the same gene on transcript-level. Don't make it too fancy, do what everyone does unless there is a good reason not to in your particular case. After all, sleuth does not seem to be maintained, it has almost 100 open issues on GitHub https://github.com/pachterlab/sleuth with no commit in the last two years, while other mentioned tools are still under active maintenance with authors outstandingly responsive on the Bioconductor support site and here. Your choice in the end.

If you really need the inferential information on gene level I would use https://bioconductor.org/packages/release/bioc/vignettes/fishpond/inst/doc/swish.html#Differential_gene_expression. THis tool is a collab between the salmon authors and the DESeq2 people, so this is probably the best choice in terms of making full use of what salmon offers.

ADD REPLY • link 6 months ago by ATpoint 84k

0

Entering edit mode

Ah yeah, thank you for the confirmation. I wasn't sure if I should switch over. I did need to do transcript level analysis too. but i think ill do as you have suggested: Deseq2 for gene level and Swish for transcript level.

Hope you have a great day :)

ADD REPLY • link updated 6 months ago by Ram 44k • written 6 months ago by unkown • 0

0

Entering edit mode

A few things: sleuth is used for either gene-level DE or transcript-level DE, not just transcript-level DE (in fact, I used it for gene-level DE in my recent paper: https://www.nature.com/articles/s41388-022-02458-9 ). sleuth is an established tool for gene-level DE and has performed well on benchmarks, but it's true it would be harder to find support for sleuth (Edit: I just realized that I was the one who made the most recent commit for sleuth so mea culpa).

Making use of bootstraps tends to perform better, whether sleuth, swish, or edgeR. In fact, I recommend using edgeR. edgeR has a "catchKallisto" and a "catchSalmon" feature to leverage that information -- it has performed the best on the most recent benchmarks (see Gordon Smyth's most recent paper).

That said, I've used DESeq2 in the past and it works great -- the loss of leveraging the bootstrap information hasn't prevented me from making relevant biological discoveries :)

ADD REPLY • link 6 months ago by dsull ★ 6.8k

Ram · Answer 1 · 2024-02-22

0

Entering edit mode

6 months ago

dsull ★ 6.8k

In answer to your question, plot_pca works slightly differently. By default, it filters your data. Secondly, it uses obs_norm as-is.

ADD COMMENT • link updated 6 months ago by Ram 44k • written 6 months ago by dsull ★ 6.8k

Method 1: plot_pca + sleuth

Method 2: prcomp + ggplot

Method 1: `plot_pca` + sleuth

Method 2: `prcomp` + `ggplot`