Question: Sleuth: Combined lrt and wt tests, clustered heatmap
3
gravatar for bioinfo8
2.3 years ago by
bioinfo8120
bioinfo8120 wrote:

Hi everyone,

I have a wildtype and 2 mutant samples. I have already run lrt (likelihood ratio test) and obtained > 40,000 transcripts but did not get any fold change in the output. After googling, I found that wald test (wt) would be a suitable option. Is it possible to run wald and lrt parrallely or in a comparative manner? What things I should keep in mind while filtering the results from both? How should I proceed further to find out the differentially expressed genes in mutant samples vs. wild?

Thank you!

heatmap rna-seq kallisto sleuth R • 3.0k views
ADD COMMENTlink modified 16 months ago by erdiazval40 • written 2.3 years ago by bioinfo8120
6
gravatar for haroldpimentel
2.3 years ago by
haroldpimentel90 wrote:

Hello there,

If you are doing a simple comparison between wild-type and mutant samples, then you can technically do the wald test. However, in our preprint [1], all of the analyses are done with the likelihood ratio test. This is because we found that the FDR is much more believable under the likelihood ratio test then the Wald test. My general impression is that the Wald test gives too many false positives, although, the rankings are fairly comparable between the two tests.

All that being said, I would suggest simply sticking with the likelihood ratio test.

You do not get any fold change in the output because the test is inherently different. However, you can manually compute the log fold change between the samples because you have a fairly simple design. Let me know if you have any questions about doing this. The function kallisto_table should give you a data frame that is easily manipulated with standard tools such as dplyr.

[1] http://biorxiv.org/content/early/2016/06/10/058164.abstract

ADD COMMENTlink written 2.3 years ago by haroldpimentel90

Finally, I went with lrt and selected transcripts with qval <0.05. This resulted to around 700 transcripts. When I try to download the heatmap, its not very clear and I am interested for a clustered heatmap for my 3 samples. I have tried various options in R using the output file (with all columns: target_id, test_stat, pval, qval, rss, sigma_sq, tech_var, mean_obs,var_obs, sigma_sq_pmax, smooth_sigma_sq, final_sigma_sq, degrees_free,ens_gene,ext_gene), the most satisfactory is the following: heatmap. If I delete other columns from the input file, there is not proper heatmap. I am interested to construct a plot like Figure 6 mentioned here with gene names at the horizontal axis. Any guidance in this regard would be appreciable.

Thanks!

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by bioinfo8120
2
gravatar for erdiazval
16 months ago by
erdiazval40
erdiazval40 wrote:

You can use the LTR to test for differential expression and get the log2 foldchange estimates from the kallisto normalized values while you don't include the filtered out values.

SNM <- kallisto_table(so, use_filtered = TRUE, normalized = TRUE, include_covariates = TRUE

ADD COMMENTlink modified 16 months ago • written 16 months ago by erdiazval40

This table outputs both a "scaled_reads_per_base" and "tpm"

I'm interested in getting a log2FC between mutant and control, which values should I compare?
Are scaled_reads or tpm in this output already log transformed? Thanks!

ADD REPLYlink written 10 months ago by samantha_jeschonek50
0
gravatar for bioinfo8
2.3 years ago by
bioinfo8120
bioinfo8120 wrote:

Thanks Harold for the reply. I have used the following command and obtained an extensive table (sample = condition; t1,t2,t3 are wildtype replicates; n and npt are two other samples).

write.csv(kallisto_table(so),"D:/....../kallisto_abundance_table.csv")

(1) However, in few places even est_counts and tpm are 0, then why it is showing eff_len and len? For e.g. for transcript ENS00000000042:

target_id sample est_counts tpm eff_len len condition

ENS00000000042 n1 0 0 2463 2585 n1

ENS00000000042 n2 4.051969132 0.08209267 2463 2585 n2

ENS00000000042 n3 4.517974826 0.0896161 2463 2585 n3

ENS00000000042 npt1 1.070880673 0.021605026 2463 2585 npt1

ENS00000000042 npt2 4.369908371 0.087321269 2463 2585 npt2

ENS00000000042 npt3 0 0 2463 2585 npt3

ENS00000000042 t1 1.570710628 0.031949241 2463 2585 t1

ENS00000000042 t2 0.893850062 0.018061154 2463 2585 t2

ENS00000000042 t3 0 0 2463 2585 t3

(2) I have obtained ~ 4,00,000 rows in the table and would like to find which genes are up/down-regulated; expressed or not in different samples. This is the initial analysis I am doing using kallisto and sleuth with three samples only, I have to do for many other samples too. Would you please guide how to proceed in this regard further.

Thank you!

ADD COMMENTlink written 2.3 years ago by bioinfo8120
0
gravatar for bioinfo8
2.3 years ago by
bioinfo8120
bioinfo8120 wrote:

In addition, I was going through the another preprint on zika and find out that both wald and lrt have been used (from supplementary data). But, the results section says FDR of 0.05 which indicates that finally results from lrt have been used. Hope I am right?

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by bioinfo8120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 890 users visited in the last hour