Hi All, I'm currently investigating DEG pattern among 19 sub-species of a legume plant and there is no treatment at all except 19 sub-species. Q#1. What is the best way to select the control genes (HKGs)? Some people (PLoS One. 2017; 12(9): e0185288) suggested "Selection of candidate reference genes from in-house RNA-seq data followed the criteria below: 1. The coefficient variance (cv) values were smaller than 0.3; 2. The calculated Log2FC (fold change) values at each time point were between -0.1 to +0.1; 3. The basemean values were 500–3,000", But the problem is I have no treatments with in my 19 samples which making me confused to use this 3 comparison criteria. So, how can I calculate and use these values in R/DESeq2 or any other packages ? (FYI: I've used string tie assembler and the subhead/featureCounts to get the feature counts matrix:
head(countdata)
Chr Start End Strand Length 1-Collin 2-Cuspi 3-Escu 4-Greggi 5-Lan-lan 6-Lempi 7-Macro 8-Mac-Ist Cruz_1 Magni_1 Mcapi_1 Mutud_1 Pulver_1
Contig4963g005220 tig4963 121429 122657 + 1229 0 0 2 0 4 0 8 0 0 2 0 0 2
Contig2758g005020 tig2758 19576 25257 - 5682 290 404 451 377 259 452 269 460 274 324 362 277 358
Contig3945g005320 tig3945 306089 307523 + 1435 8 30 14 63 29 78 23 28 23 34 10 59 78
Contig6785g005270 tig6785 235035 243069 - 8035 180 56 257 269 408 753 642 1054 817 172 824 329 807
Contig5396g005000 tig5396 1339 4892 + 3554 491 755 709 380 3077 2941 1751 3428 3305 3351 3875 1084 1132
Contig4798g005290 tig4798 176013 176555 - 543 196 65 132 83 71 34 155 56 62 19 29 61 123
Retus_1 Salvo_1 Shann_1 Tricha_1 Trichod_1 Zacap_1
Contig4963g005220 0 0 2 0 8 2
Contig2758g005020 298 115 354 464 384 102
Contig3945g005320 72 8 9 16 0 20
Contig6785g005270 838 1216 148 223 108 191
Contig5396g005000 902 1087 4081 508 1519 1841
Contig4798g005290 169 16 89 80 101 27
Q#2. Should I use the original count value or it's better to use FPKM or TPR to start my DEG analysis?
Q#3. Just wanna make sure that my condition in DESeq is only my 19 species like the following? am I right or there's any default ? condition <- c("1-Collin", "2-Cuspi", "3-Escu", "4-Greggi", "5-Lan-lan", "6-Lempi", "7-Macro", "8-Mac-Ist", "Cruz_1", "Magni_1", "Mcapi_1", "Mutud_1", "Pulver_1", "Retus_1", "Salvo_1", "Shann_1", "Tricha_1", "Trichod_1", "Zacap_1") coldata <- data.frame(row.names=colnames(countdata), condition) dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition) dds
Q#4. After I detect the control genes in anyway (say hundreds of them), some people suggested that we can use a function in the DESeq R such as dds_efs <- estimateFactorSize(dds, controlgenes(1:200)) to normalize the counts my data. So, then how to analyze further and identify the significant DE genes among the 19 samples?
Q#5. What would be the best way to visualize the results either in DESeq or other packages? Is WGCNA R package a good way to find the patterns of genes among 19 spices? any suggestion will be helpful!
Thanks, MJ
Hi swbarnes2, Thank you for your quick response! 1. I can build DESeq object using DESeqDataSetFromMatrix() as following.
As my experiment has no treatment, is it possible to take advantage of control genes (HKGs; spikes) and analyze the expression of my genes of interest; like what people do in qRT-PCR through ddCT.
2. Our wet lab technician actually pooled 3 biological replicates for each 19 sample (+ technical replicates at Illumina sequencing step). I believe it could be possible if I could find a way to take advantage of HKGs. Thanks, MJ
Well, sure you can make the DESeq object, you can even size normalize, (with or without the use of housekeeping genes) but that doesn't mean you can run the DESeq algorithm on it to find DE genes between one sample and one sample. Pooling bio replicates won't change that.
Thank you for your response. So, just for my curiosity, what if we had 3 biological replicate for each 19 sample, is there any way to analyze this data and get DEGs?
Secondly, what if I just wanna identify the unexpressed genes or unchanged genes (like PPR gene family) among my 19 species?
Thanks, MJ