Sleuth: Combined lrt and wt tests, clustered heatmap
4
3
Entering edit mode
5.4 years ago
bioinfo8 ▴ 180

Hi everyone,

I have a wildtype and 2 mutant samples. I have already run lrt (likelihood ratio test) and obtained > 40,000 transcripts but did not get any fold change in the output. After googling, I found that wald test (wt) would be a suitable option. Is it possible to run wald and lrt parrallely or in a comparative manner? What things I should keep in mind while filtering the results from both? How should I proceed further to find out the differentially expressed genes in mutant samples vs. wild?

Thank you!

RNA-Seq Sleuth kallisto R heatmap • 5.6k views
8
Entering edit mode
5.4 years ago

Hello there,

If you are doing a simple comparison between wild-type and mutant samples, then you can technically do the wald test. However, in our preprint [1], all of the analyses are done with the likelihood ratio test. This is because we found that the FDR is much more believable under the likelihood ratio test then the Wald test. My general impression is that the Wald test gives too many false positives, although, the rankings are fairly comparable between the two tests.

All that being said, I would suggest simply sticking with the likelihood ratio test.

You do not get any fold change in the output because the test is inherently different. However, you can manually compute the log fold change between the samples because you have a fairly simple design. Let me know if you have any questions about doing this. The function kallisto_table should give you a data frame that is easily manipulated with standard tools such as dplyr.

0
Entering edit mode

Finally, I went with lrt and selected transcripts with qval <0.05. This resulted to around 700 transcripts. When I try to download the heatmap, its not very clear and I am interested for a clustered heatmap for my 3 samples. I have tried various options in R using the output file (with all columns: target_id, test_stat, pval, qval, rss, sigma_sq, tech_var, mean_obs,var_obs, sigma_sq_pmax, smooth_sigma_sq, final_sigma_sq, degrees_free,ens_gene,ext_gene), the most satisfactory is the following: heatmap. If I delete other columns from the input file, there is not proper heatmap. I am interested to construct a plot like Figure 6 mentioned here with gene names at the horizontal axis. Any guidance in this regard would be appreciable.

Thanks!

3
Entering edit mode
4.5 years ago
erdiazval ▴ 80

You can use the LTR to test for differential expression and get the log2 foldchange estimates from the kallisto normalized values while you don't include the filtered out values.

SNM <- kallisto_table(so, use_filtered = TRUE, normalized = TRUE, include_covariates = TRUE

0
Entering edit mode

This table outputs both a "scaled_reads_per_base" and "tpm"

I'm interested in getting a log2FC between mutant and control, which values should I compare?

0
Entering edit mode
5.4 years ago
bioinfo8 ▴ 180

Thanks Harold for the reply. I have used the following command and obtained an extensive table (sample = condition; t1,t2,t3 are wildtype replicates; n and npt are two other samples).

write.csv(kallisto_table(so),"D:/....../kallisto_abundance_table.csv")


(1) However, in few places even est_counts and tpm are 0, then why it is showing eff_len and len? For e.g. for transcript ENS00000000042:

target_id sample est_counts tpm eff_len len condition

ENS00000000042 n1 0 0 2463 2585 n1

ENS00000000042 n2 4.051969132 0.08209267 2463 2585 n2

ENS00000000042 n3 4.517974826 0.0896161 2463 2585 n3

ENS00000000042 npt1 1.070880673 0.021605026 2463 2585 npt1

ENS00000000042 npt2 4.369908371 0.087321269 2463 2585 npt2

ENS00000000042 npt3 0 0 2463 2585 npt3

ENS00000000042 t1 1.570710628 0.031949241 2463 2585 t1

ENS00000000042 t2 0.893850062 0.018061154 2463 2585 t2

ENS00000000042 t3 0 0 2463 2585 t3

(2) I have obtained ~ 4,00,000 rows in the table and would like to find which genes are up/down-regulated; expressed or not in different samples. This is the initial analysis I am doing using kallisto and sleuth with three samples only, I have to do for many other samples too. Would you please guide how to proceed in this regard further.

Thank you!

0
Entering edit mode
5.4 years ago
bioinfo8 ▴ 180

In addition, I was going through the another preprint on zika and find out that both wald and lrt have been used (from supplementary data). But, the results section says FDR of 0.05 which indicates that finally results from lrt have been used. Hope I am right?