Kallisto/Sleuth - issue when using own annotations in the sleuth_prep() function
0
0
Entering edit mode
5.6 years ago
BioBing ▴ 150

Hi all,

Hoping that some of you are familiar with Kallisto/Sleuth and might be able to help me out.

I am analyzing transcripts from a non-model species with triplicates of a control condition and two different treatments.

For annotation, I have used Diamond (none of the Ensembl genomes are close to my species) and now want to lift my analysis from transcript - to gene level.

The analysis of the transcripts alone resulted in 500 (differentially expressed transcripts (DETs) for the one treatment and 832 DETs for the other in comparison with the control.

But when I try to "lift" it to gene level, I all of a sudden get 3078 differentially expressed genes (DEGs) for the one treatment and 2153 DEGs for the second. This does not seem right since I expected repeated transcripts to "collapse" and result in lower numbers.

Do any of you know what I am doing wrong?

Thank you!

Best wishes, Birgitte

The code for DEG determination:

# The dataframe containing Trinity transcript IDs (target_id), acession_numbers (annotation_id), and gene names (gene) was being loaded into R

#set base directory
base_dir <- "~/Documents/RNASeq/Tools/kallisto/data/"

# renaming the annotation df
t2g <- dplyr::rename(AnnotationID,
target_id = target_id,
ens_gene = annotation_id,
ext_gene = gene)

# running the sleuth analysis
so <- sleuth_prep(s2c, ~treat, target_mapping = t2g, aggregation_column = 'ens_gene')
so <- sleuth_fit(so)
so <- sleuth_fit(so, ~1, "reduced")
so <- sleuth_lrt(so, 'reduced', 'full')

results_table_lrt <- sleuth_results(so, 'reduced:full', test_type = 'lrt')

so <- sleuth_wt(so, 'treatTreat1')
so <- sleuth_wt(so, 'treatTreat2')
results_table_wt_treat1 <- sleuth_results(so, 'treatTreat1')
results_table_wt_treat2 <- sleuth_results(so, 'treatTreat2')

Treat1 <- subset(results_table_wt_treat1, qval < 0.05)
Treat2 <- subset(results_table_wt_treat2, qval < 0.05)

RNA-Seq R sleuth kallisto rna-seq • 2.5k views
0
Entering edit mode

This does not seem right since I expected repeated transcripts to "collapse" and result in lower numbers.

I don't think this is necessarily the case. What also happens by collapsing is boosting the read count for the genes (the sum of the transcripts) and therefore increasing also the power to detect differential expression. It's perfectly possible that for lowly expressed transcripts the counts are insufficient, but when collapsing to the gene level you reach significance.