Question: analysis RNAseq data for cancer studies?
gravatar for Donna
2.2 years ago by
Donna10 wrote:

Dear all

I followed some links here in biostar to get the differential expressions of my RNAseq data for tumor vs control. Then I get the pathways, I did somatic mutations using GATK pipeline to get some somatic mutations.

I found some differentially expressed genes and found common somatic mutations in them, could be interesting. Then I analyzed the top pathways to see if they are related to cancers, nothing interesting is found.

I am still trying to connect pieces. Any suggestion how can I conclude my results? What else we can do?

Thank you

cancer rna-seq • 573 views
ADD COMMENTlink modified 2.2 years ago by Kevin Blighe61k • written 2.2 years ago by Donna10
gravatar for Kevin Blighe
2.2 years ago by
Kevin Blighe61k
University College London
Kevin Blighe61k wrote:

Just some ideas off the top of my head:

  1. Mutation-to-expression modelling: For each mutation, test it's association to the expression of differentially expressed genes (DEGs) in the mutation's 'vicinity'. This can be as easy as building a linear regression model with expression as the y (dependent) variable and mutation present/absent as x (predictor). From this, you could derive R-squared values and cross validated 'shrunk' R-squared values, along with p-values. y variable would be continuous; x variable would be categorical with mutation absent as reference/base level.
  2. Transcription factor binding sites: Check for new TFBS (transcription factor binding sites) that may be introduced as a result of each mutation. Look at databases like JASPAR to do this - there are also other threads on biostars. There are undoubtedly some mutations in your data that are going to modulate expression of nearby genes. For an idea of mechanism, see the wonderful study by Manour: An Oncogenic Super-Enhancer Formed Through Somatic Mutation of a Noncoding Intergenic Element
  3. Histone binding regions: Check for overlapping histone methylation (e.g. H3K27me3) and acetylation (e.g. H3K27ac) binding regions - this data is available from the UCSC, as far as I know. A mutation in such regions could modify chromatin structure and alter expression.
  4. Transcription start sites: Overlapping transcription start sites (TSS) - again, available from UCSC I believe
  5. In silico prediction: Use one of those functional / pathogenicity prediction tools. There have been many tools released in recent years, including ones tailored for cancer and somatic mutations. Take a quick look here: A: pathogenicity predictors of cancer mutations

Noe that, technically, you could introduce all of the data from points 2-5 into the model mentioned in point 1. This would then be a robust way to assess the role of each mutation in relation to gene expression.

Finally, thinking just about the RNA-seq data, you could deconvolute it in order to identify immune cell-types that may be present in the tumour. This would give you an indication of the amount of immune cell infiltration, which is likely to differ across your tumors.

There are yet more ideas that I have not mentioned.


ADD COMMENTlink written 2.2 years ago by Kevin Blighe61k

Thank you Kevin for the detailed answer. Thats so amazing :) Good to learn about deconvoluting RNAseq data, this is interesting, I need to learn that :)

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Donna10

Thanks I actually need to follow my own advice and do these things on my data, too :)

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Kevin Blighe61k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1969 users visited in the last hour