Statistical tests for sanity check of expression data before inferring network
2
1
Entering edit mode
8.4 years ago
jack ▴ 940

I have collected gene expression data for one of the cancer and I want to infer gene regulatory network out of it.

Before reconstructing network, I want to make sure that my data is good enough.

Wich kind of statistical tests I can do to have feeling that my data is good enough. I'm thinking about T-test.

sequencing genome • 3.4k views
ADD COMMENT
1
Entering edit mode
8.4 years ago

Student's T-Test and Benjamini Hochberg multiple test correction are the traditional means of statistically testing the robustness of expression data

ADD COMMENT
0
Entering edit mode

All of my data come from Breast cancer. I want to check the quality of data before using it. is it good idea to have plot variance, correlation, t tests for every gene or transcripts in different samples from same diseases ?

ADD REPLY
1
Entering edit mode

try out the arrayqualitymetrics in bioconductor to assess the quality of your array data. You might also want to do a PCA / dendogram plot to look at the variance of the data. Boxplots and density plots will tell you how well the data normalised too.

ADD REPLY
0
Entering edit mode

But my expression data come from NGS not microarray. I think the metric used in microarray is not applicable to RNAseq data.

ADD REPLY
0
Entering edit mode

What pipeline are you using to analyse the RNA Seq data?

ADD REPLY
0
Entering edit mode

I don't use any pipeline. I'm trying to learn an statistical model

ADD REPLY
1
Entering edit mode

ok, if you want to learn about the statistical models used for normalising RNA seq data, check out http://cufflinks.cbcb.umd.edu/manual.html#cuffnorm

Cuffnorm is the program from the tuxedo pipeline used for normalising RNA seq datasets. If you look in the manual, there are a few parameters for tailoring normalisation http://cufflinks.cbcb.umd.edu/manual.html#library_norm_meth

That might give you an indication of where to start

ADD REPLY
0
Entering edit mode

Basically my concern is to do carry out some statistical tests to check the quality of RNA-seq data, before giving it to my model.

Now the question for me is that, which kind of statistic tests shows the quality of data? my data belong to one cancer type.

ADD REPLY
0
Entering edit mode

This is why using a pipeline would help, especially in something like the Tuxedo pipeline, as they include an R package (CummeRbund), which provides extensive QC tools. Also, they provide Cuffnorm (which I mentioned earlier), which is the normalised expression set.

If I were you, I'd run Tophat, Cufflinks, Cuffmerge, Cuffdiff. Load the Cuffdiff output into R and check out the QC tools in there, that will show you how good your data actually is. Then you can use Cuffnorm to get the normalised expression set out and throw that into your model.

ADD REPLY
0
Entering edit mode
8.4 years ago

Post updated above.

ADD COMMENT

Login before adding your answer.

Traffic: 2218 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6