Question: Analysing gene expression across clinical parameters
1
gravatar for elizabethR
16 months ago by
elizabethR30
elizabethR30 wrote:

Hi

I have been doing some bioinformatic analysis of TCGA data as an adjunct to my PhD. I am wanting to take this a little further and analyse RNASeq expression data according to clinical parameters also, such as disease stage, age and sex, histological markers of invasion etc. Has anyone done this analysis? What sort of statistical analyses do you use for this? I am assuming that performing ANOVA analysis is not appropriate with such a large dataset with so many multiple comparisons being made across the dataset? Bioinformatic statistics is a very new area to me.

Also could anyone recommend packages to do this type of analysis? I have started using the amazing R studio and TCGAbiolinks but there doesnt appear to be a package in the Bioconductor guide that is suitable for this.

Really grateful for any advice guys :-)

clinical rna-seq tcga • 504 views
ADD COMMENTlink modified 16 months ago • written 16 months ago by elizabethR30

Hi elizabethR I am not very familiar with TCGA data but If you want to do a class comparison test between two or more phenotypes ,first you should preprocess your data(including log transformation,summarization,normalization),if you want to use GEO data,I offer you using InsilicoDB https://insilicodb.com to retrieve preprocessed data, and it provides you a pipeline to do your job simply.In addition limma is a robust package for analyzing gene expression data it could create a liner model for finding markers of each phenotype https://bioconductor.org/packages/release/bioc/html/limma.html

ADD REPLYlink written 16 months ago by Shamim Sarhadi170
0
gravatar for EagleEye
16 months ago by
EagleEye4.7k
Sweden
EagleEye4.7k wrote:

Check this out, A: Use of TCGA database to get information on protein Expression

ADD COMMENTlink written 16 months ago by EagleEye4.7k
0
gravatar for elizabethR
16 months ago by
elizabethR30
elizabethR30 wrote:

Thank you guys, EagleEye that link was very useful. EdgeR manual says you should use raw counts that haven't been normalised because it's normalised and log transformed as part of its mathematical modelling. I've used edgeR to perform differential expression analysis across clinical parameters

ADD COMMENTlink written 16 months ago by elizabethR30

In that case have a look at this post,

A: How to work with Level 3 data (RPKM values) from TCGA database

ADD REPLYlink written 16 months ago by EagleEye4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1373 users visited in the last hour