Question: How can I analyze this data in Deseq?
0
gravatar for MAPK
4 weeks ago by
MAPK1.4k
United States
MAPK1.4k wrote:

I have my data called count.mat.

count.mat  <-   structure(list(Sample_1 = c(2968, 1701, 272, 249, 186, 1598, 
98, 9, 563, 1003, 278, 242, 137, 625, 614, 494, 681, 475, 3, 
89), Sample_2 = c(2057, 1237, 339, 688, 289, 1354, 265, 17, 402, 
793, 343, 267, 389, 477, 838, 663, 741, 586, 9, 244), Sample_3 = c(5, 
3, 5, 6, 2, 14, 2, 9, 4, 4, 3, 3, 1, 1, 2, 1, 16, 7, 3, 4), Sample_4 = c(5, 
4, 1, 4, 1, 6, 1, 10, 9, 7, 2, 2, 2, 5, 4, 3, 3, 6, 1, 4), Sample_5 = c(8175, 
4828, 2268, 3256, 1712, 4048, 1208, 10, 3166, 4315, 2536, 1610, 
1531, 2794, 3976, 3030, 2427, 4788, 9, 1708), Sample_6 = c(4170, 
2068, 506, 220, 190, 1070, 76, 10, 790, 1635, 264, 146, 67, 558, 
771, 423, 481, 864, 2, 77), Sample_7 = c(15, 6, 4, 3, 2, 4, 2, 
15, 5, 5, 2, 7, 2, 4, 7, 8, 3, 11, 22, 4), Sample_8 = c(15, 7, 
2, 7, 4, 4, 1, 13, 5, 5, 4, 3, 3, 4, 3, 4, 2, 4, 5, 4), Sample_9 = c(132, 
356, 41, 29, 24, 40, 14, 61, 40, 66, 29, 136, 18, 99, 66, 144, 
55, 115, 8, 42), Sample_10 = c(1199, 614, 491, 757, 96, 417, 
170, 47, 486, 452, 437, 189, 563, 481, 857, 1003, 266, 595, 3, 
185), Sample_11 = c(6853, 4667, 4051, 5964, 2940, 6831, 2762, 
107, 9957, 7058, 4048, 4023, 4366, 5235, 5234, 5380, 2904, 6320, 
15, 5502), Sample_12 = c(6332, 4489, 2739, 5371, 2153, 2841, 
1982, 63, 6032, 4417, 2025, 4630, 3264, 3666, 4629, 4601, 2240, 
3408, 15, 4309), Sample_13 = c(16, 14, 10, 1, 2, 22, 1, 2, 5, 
8, 3, 2, 7, 4, 6, 4, 9, 2, 2, 7), Sample_14 = c(12, 6, 5, 4, 
2, 6, 2, 13, 9, 5, 4, 3, 4, 1, 9, 5, 2, 5, 2, 2), Sample_15 = c(6058, 
3363, 1458, 2846, 1315, 3496, 1119, 2, 2028, 4566, 1097, 1135, 
811, 1743, 2310, 2023, 1121, 2420, 10, 2224), Sample_16 = c(23362, 
10663, 2875, 6581, 2264, 4756, 1907, 2, 4405, 4305, 4519, 4197, 
2786, 3201, 9012, 4905, 3179, 3493, 1, 2829), Sample_17 = c(7548, 
2659, 1018, 1363, 376, 7433, 704, 14, 1405, 2772, 1099, 788, 
511, 796, 1721, 1239, 1137, 1519, 6, 606), Sample_18 = c(19, 
6, 9, 5, 3, 18, 2, 9, 7, 12, 2, 5, 3, 5, 15, 7, 2, 8, 16, 3), 
    Sample_19 = c(25, 11, 1, 5, 1, 13, 4, 23, 13, 12, 3, 7, 2, 
    4, 6, 4, 2, 4, 8, 2), Sample_20 = c(7, 3, 2, 4, 1, 3, 2, 
    13, 3, 2, 1, 2, 2, 1, 5, 5, 1, 2, 6, 2)), row.names = c("AAAACAAGTTCTGAATCGTATA", 
"AAAACAAGTTCTGAATCGTATAA", "AAGAAAAGATCATGTCCCAAGA", "AAGAAGAGAATCACAGCCGCCA", 
"AAGAATCCTTCAAAGGGCCTAGA", "AAGCTGAAGTTTTGACATGTCA", "AATCACAGCCGCCAAGTCCTCTA", 
"AATCAGTGAACGAAAGTTAGG", "AATCTAAAGTTTGTCCCATTA", "AATCTTTGACTACATTACGTAAA", 
"AATGTCCAATGGAACGCCGTTA", "ACCATCTCCGTGTCTACAATCA", "ACTCCAGGCGAGCATGCAGTCCA", 
"ACTCTTAGAGATGAATATTCCGA", "AGGGATCCTTATGTCCTAGCCA", "AGTACAAGTCTACGAGCCCATCGA", 
"AGTCCAATGGATCTTTCACTCGA", "AGTCCTTGCCAATGAGGCTTTTA", "AGTGAACGAAAGTTAGGGGAT", 
"AGTGCTAGTTCCCAACGTCGTTGA"), class = "data.frame")

I have my conditions as:

sample.type <-
  as.factor (
    c(
      "Ago2_SsHV2L",
      "Ago2_SsHV2L",
      "Ago2_VF",
      "Ago2_VF",
      "Ago4_SsHV2L",
      "Ago4_SsHV2L",
      "Ago4_VF",
      "Ago4_VF",
      "Dcl1_SsHV2L",
      "Dcl1_SsHV2L",
      "Dcl2_SsHV2L",
      "Dcl2_SsHV2L",
      "SlaGem_2mer",
      "SlaGem_2mer",
      "WTDK3_SsHV2L",
      "WTDK3_SsHV2L",
      "WTDK3_SsHV2L",
      "WTDK3_VF",
      "WTDK3_VF",
      "WTDK3_VF"
    )
  )

infection.status <-
  as.factor(
    c(
      "SsHV2L",
      "SsHV2L",
      "VF",
      "VF",
      "SsHV2L",
      "SsHV2L",
      "VF",
      "VF",
      "SsHV2L",
      "SsHV2L",
      "SsHV2L",
      "SsHV2L",
      "SlaGem_2mer",
      "SlaGem_2mer",
      "SsHV2L",
      "SsHV2L",
      "SsHV2L",
      "VF",
      "VF",
      "VF"
    )
  )

cond <- data.frame(sample.type,infection.status, stringsAsFactors=TRUE)


cond <- as.data.frame(cond)

Deseq Analysis:

library(dplyr)
# Doing for infection.status
dds <- DESeqDataSetFromMatrix(countData=count.mat,colData=cond,design=~infection.status)
dds
dds$infection.status <- relevel(dds$infection.status, ref = "VF")  
dds = DESeq(dds, fitType = "parametric")
my.names<-resultsNames(dds)

Since I wanted to compare the DE between SsHV2L_vs_VF, I have done this:

res<- results(dds, contrast=list(c("infection.status_SsHV2L_vs_VF" )), test="Wald")
res <- as.data.frame(res)
res <- res[with(res, order(padj)), ] %>% select (colnames(res))

However, the result does not look correct. The VF samples have less than 20 counts in all samples and SsHV2L samples have more than 1000 counts. If you look at the padj, the pvalues are significant only for the reads with the fewer counts in both VF and SsHV2L samples. I was wondering why it's not significant for the reads with 1000s of counts in SsHV2L columns and fewer in VF. My data have only 130 rows and I was wondering if that would be the reason causing this problem. Can someone please help me if I am doing it right? Thanks!

deseq deseq2 R rnaseq • 153 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by MAPK1.4k

Which type of data is this?

ADD REPLYlink written 4 weeks ago by Kevin Blighe41k

This is smallRNA seq data.

ADD REPLYlink written 4 weeks ago by MAPK1.4k
1

A quick search reveals a few posts on Bioconductor where the DESeq/DESeq2 developers seem to shy away from recommending DESeq for such data. I saw replies from Michael, Simon, and Wolfgang. Your data does follow a negative binomial, though (I checked via histogram).

These are obviously raw counts that you have? You could try some rudimentary median-based normalisation strategy and then test each gene independently via negative binomial regression using glm.nb(). I presume that you exhaustively searched for other methods.

ADD REPLYlink written 4 weeks ago by Kevin Blighe41k

@Kevin Blighe Thanks Kevin for your answer. Yes, I wanted to look for miRNA-like reads with differential expression between samples. I have seen some papers using DESeq for these reads and wanted to give it a try.

ADD REPLYlink written 4 weeks ago by MAPK1.4k

In which journals were they? They may have only used DESeq2 for, for example, size factor calculation (?).

ADD REPLYlink written 4 weeks ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1618 users visited in the last hour