Question

Should I taking average of normalized or raw counts

0

Entering edit mode

5.3 years ago

zizigolu ★ 4.3k

I am combing raw read counts of two data sets by taking average of counts; Gene CHGA is differentially expressed in each of datasets separately but not differentially expressed when I merged data sets by taking average of raw counts for both data.

My question is; should I take average of raw read counts or normalized read counts (for example CPM ) then converting to raw count before differential expression?

> merged[rownames(merged)=="CHGA",]
     A2   A3  A4  A6 A7   A8  A9  A10 A11  A12 B4  B5  B6 B7 B8 B9 B10 B11 B12  C1 C2  C3  C4  C5  C6 G12 D1 D2   D3  D4
CHGA 20 3309 223 297 92 2072 147 2042  59 5356 92 899 180 16 22 67 212  80  72 270 36 198 110 170 202  52 53 32 1630 784
      D5  D6  D7  D8   D9  D10 D11  D12  E1  E2 E3   E4   E5   E6 E7  E8    E9 E10 E11 E12  F1 F2   F5   F6  F7 G8
CHGA 555 434 292 163 1144 2090  73 1300 277 270 89 1037 5888 8484 70 152 32942  20  19 206 978 76 4080 1318 202 70



> biomarker[rownames(biomarker)=="CHGA",]
     A2   A3 A4  A6 A7   A8 A9  A10 A11 A12  B4  B5  B6 B7 B8  B9 B10 B11 B12  C1 C2  C3 C4 C5  C6 G12 D1 D2  D3  D4  D5
CHGA 17 3366 70 530 30 1833 57 1431  62  32 146 144 320 16 33 109 340 111 116 516 53 202  4 51 397  79 65 30 681 780 981
      D6  D7  D8   D9  D10 D11  D12  E1  E2 E3 E4   E5    E6 E7  E8    E9 E10 E11 E12  F1 F2  F5   F6  F7 G8
CHGA 816 529 120 1560 3167  37 1327 131 152 52 47 3080 16924 82 133 42006  10   8 258 811 22 147 2551 160 32

> immune[rownames(immune)=="CHGA",]
     A2   A3  A4 A6  A7   A8  A9  A10 A11   A12 B4   B5 B6 B7 B8 B9 B10 B11 B12 C1 C2  C3  C4  C5 C6 G12 D1 D2   D3  D4
CHGA 23 3252 376 64 154 2311 237 2653  56 10681 37 1654 40 16 10 25  83  48  27 25 20 193 215 290  7  26 41 35 2579 787
      D5 D6 D7  D8  D9  D10 D11  D12  E1  E2  E3   E4   E5 E6 E7  E8    E9 E10 E11 E12   F1  F2   F5 F6  F7  G8
CHGA 129 52 54 206 729 1012 109 1272 423 387 126 2027 8696 43 58 172 23878  31  30 155 1144 130 8014 85 244 107

R RNA-Seq • 1.7k views

ADD COMMENT • link 5.3 years ago by zizigolu ★ 4.3k

4

Entering edit mode

No, this seems not like a correct approach. If possible combine both datasets in a design including a batch factor. Read more about it in limma or edgeR manuals how to make a design for this approach.

ADD REPLY • link 5.3 years ago by Benn 8.3k

0

Entering edit mode

Thank you, I learned how to do batch correction but the problem is we have 700 common genes between both data. I got confused should I do batch correction for 700 common genes or whole of both data?

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Try to solve the problem at the beginning, find the raw data such as fastq or bam files, and then generate the raw read counts for the same set of genes.

ADD REPLY • link 5.3 years ago by Benn 8.3k

0

Entering edit mode

This is a HTG EdgeSeq assay, I was given excel files of raw read counts of both data :(

I have done t-test of 700 common genes between two data and removed inconsistent genes between data sets (p-value < 0.05). From 700 genes 400 genes showed consistent expression of which I took average of raw read counts of both data and added up these genes with uncommon genes and made a matrix of raw counts but differential expressed genes by DESeq2 says DEGs changed a lot in compared to data sets individually.

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Sorry to say, but this is approach is certainly not the correct way to analyze RNA-seq data. My advice is, like I said before, start over with raw data, ask for raw data instead of some excel files.

ADD REPLY • link 5.3 years ago by Benn 8.3k

1

Entering edit mode

@b.nota: This is not standard RNAseq data.

@F: What did HTG support say about downstream analysis of the data?

ADD REPLY • link 5.3 years ago by GenoMax 141k

0

Entering edit mode

Thank you,

This is exactly company's reply to my email

 Hi Fereshteh,

 HTG does not provide a software to analyze data.

 If you want, I can calculate the CPM and normalized values of your data, 
but for data analysis like differential expression, you need to use specific biostat software (like R ).

I have installed HTG EdgeSeq parser software on my computer, I have fastq files for each sample but I don't why I have 4 fastq files for each sample, technician says that by importing fastq files in software that will return excel file of raw read counts but I am not sure how to manipulate fastq files to combine reads from common genes.

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Thank you, I will go through your advice and I know I will need to create some posts in biostars over that :(

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k