Question

RNA-seq Differentially expressed genes analysis when there's no control treatment and we just have multiple HKGs

0

Entering edit mode

4.5 years ago

mjavad2012 ▴ 10

Hi All, I'm currently investigating DEG pattern among 19 sub-species of a legume plant and there is no treatment at all except 19 sub-species. Q#1. What is the best way to select the control genes (HKGs)? Some people (PLoS One. 2017; 12(9): e0185288) suggested "Selection of candidate reference genes from in-house RNA-seq data followed the criteria below: 1. The coefficient variance (cv) values were smaller than 0.3; 2. The calculated Log2FC (fold change) values at each time point were between -0.1 to +0.1; 3. The basemean values were 500–3,000", But the problem is I have no treatments with in my 19 samples which making me confused to use this 3 comparison criteria. So, how can I calculate and use these values in R/DESeq2 or any other packages ? (FYI: I've used string tie assembler and the subhead/featureCounts to get the feature counts matrix:

head(countdata)

                      Chr  Start    End Strand Length 1-Collin 2-Cuspi 3-Escu 4-Greggi 5-Lan-lan 6-Lempi 7-Macro 8-Mac-Ist Cruz_1 Magni_1 Mcapi_1 Mutud_1 Pulver_1
Contig4963g005220 tig4963 121429 122657      +   1229        0       0      2        0         4       0       8         0      0       2       0       0        2
Contig2758g005020 tig2758  19576  25257      -   5682      290     404    451      377       259     452     269       460    274     324     362     277      358
Contig3945g005320 tig3945 306089 307523      +   1435        8      30     14       63        29      78      23        28     23      34      10      59       78
Contig6785g005270 tig6785 235035 243069      -   8035      180      56    257      269       408     753     642      1054    817     172     824     329      807
Contig5396g005000 tig5396   1339   4892      +   3554      491     755    709      380      3077    2941    1751      3428   3305    3351    3875    1084     1132
Contig4798g005290 tig4798 176013 176555      -    543      196      65    132       83        71      34     155        56     62      19      29      61      123
                  Retus_1 Salvo_1 Shann_1 Tricha_1 Trichod_1 Zacap_1
Contig4963g005220       0       0       2        0         8       2
Contig2758g005020     298     115     354      464       384     102
Contig3945g005320      72       8       9       16         0      20
Contig6785g005270     838    1216     148      223       108     191
Contig5396g005000     902    1087    4081      508      1519    1841
Contig4798g005290     169      16      89       80       101      27

Q#2. Should I use the original count value or it's better to use FPKM or TPR to start my DEG analysis?

Q#3. Just wanna make sure that my condition in DESeq is only my 19 species like the following? am I right or there's any default ? condition <- c("1-Collin", "2-Cuspi", "3-Escu", "4-Greggi", "5-Lan-lan", "6-Lempi", "7-Macro", "8-Mac-Ist", "Cruz_1", "Magni_1", "Mcapi_1", "Mutud_1", "Pulver_1", "Retus_1", "Salvo_1", "Shann_1", "Tricha_1", "Trichod_1", "Zacap_1") coldata <- data.frame(row.names=colnames(countdata), condition) dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition) dds

Q#4. After I detect the control genes in anyway (say hundreds of them), some people suggested that we can use a function in the DESeq R such as dds_efs <- estimateFactorSize(dds, controlgenes(1:200)) to normalize the counts my data. So, then how to analyze further and identify the significant DE genes among the 19 samples?

Q#5. What would be the best way to visualize the results either in DESeq or other packages? Is WGCNA R package a good way to find the patterns of genes among 19 spices? any suggestion will be helpful!

Thanks, MJ

RNA-Seq DEseq Housekeeping genes (HKGs) WGCNA • 834 views

ADD COMMENT • link updated 4.5 years ago by swbarnes2 14k • written 4.5 years ago by mjavad2012 ▴ 10

score 1 · Answer 1 · 2019-10-10

1

Entering edit mode

4.5 years ago

swbarnes2 14k

Are you completely sure that the built-in normalization functions of DESeq or EdgeR are unsuitable for you?

Do you only have 1 example of each subspecies? That's terrible experiment design. I don't believe either DESeq or EdgeR will allow you to compare only a single sample to anything.

ADD COMMENT • link 4.5 years ago by swbarnes2 14k

0

Entering edit mode

Hi swbarnes2, Thank you for your quick response! 1. I can build DESeq object using DESeqDataSetFromMatrix() as following.

dds class: DESeqDataSet dim: 1165 19 metadata(1): version assays(1): counts rownames(1165): Contig4963g005220 Contig2758g005020 ... Contig3227g005030 Contig3720g006990 rowData names(0): colnames(19): 1-Collin 2-Cuspi ... Trichod_1 Zacap_1 colData names(1): condition

As my experiment has no treatment, is it possible to take advantage of control genes (HKGs; spikes) and analyze the expression of my genes of interest; like what people do in qRT-PCR through ddCT.
2. Our wet lab technician actually pooled 3 biological replicates for each 19 sample (+ technical replicates at Illumina sequencing step). I believe it could be possible if I could find a way to take advantage of HKGs. Thanks, MJ

ADD REPLY • link 4.5 years ago by mjavad2012 ▴ 10

0

Entering edit mode

Well, sure you can make the DESeq object, you can even size normalize, (with or without the use of housekeeping genes) but that doesn't mean you can run the DESeq algorithm on it to find DE genes between one sample and one sample. Pooling bio replicates won't change that.

ADD REPLY • link 4.5 years ago by swbarnes2 14k

0

Entering edit mode

Thank you for your response. So, just for my curiosity, what if we had 3 biological replicate for each 19 sample, is there any way to analyze this data and get DEGs?

Secondly, what if I just wanna identify the unexpressed genes or unchanged genes (like PPR gene family) among my 19 species?

Thanks, MJ

ADD REPLY • link 4.5 years ago by mjavad2012 ▴ 10