analysis RNAseq data for cancer studies?
1
1
Entering edit mode
5.9 years ago
Donna ▴ 10

Dear all

I followed some links here in biostar to get the differential expressions of my RNAseq data for tumor vs control. Then I get the pathways, I did somatic mutations using GATK pipeline to get some somatic mutations.

I found some differentially expressed genes and found common somatic mutations in them, could be interesting. Then I analyzed the top pathways to see if they are related to cancers, nothing interesting is found.

I am still trying to connect pieces. Any suggestion how can I conclude my results? What else we can do?

Thank you

RNA-Seq Cancer • 1.0k views
ADD COMMENT
3
Entering edit mode
5.9 years ago

Just some ideas off the top of my head:

  1. Mutation-to-expression modelling: For each mutation, test it's association to the expression of differentially expressed genes (DEGs) in the mutation's 'vicinity'. This can be as easy as building a linear regression model with expression as the y (dependent) variable and mutation present/absent as x (predictor). From this, you could derive R-squared values and cross validated 'shrunk' R-squared values, along with p-values. y variable would be continuous; x variable would be categorical with mutation absent as reference/base level.
  2. Transcription factor binding sites: Check for new TFBS (transcription factor binding sites) that may be introduced as a result of each mutation. Look at databases like JASPAR to do this - there are also other threads on biostars. There are undoubtedly some mutations in your data that are going to modulate expression of nearby genes. For an idea of mechanism, see the wonderful study by Manour: An Oncogenic Super-Enhancer Formed Through Somatic Mutation of a Noncoding Intergenic Element
  3. Histone binding regions: Check for overlapping histone methylation (e.g. H3K27me3) and acetylation (e.g. H3K27ac) binding regions - this data is available from the UCSC, as far as I know. A mutation in such regions could modify chromatin structure and alter expression.
  4. Transcription start sites: Overlapping transcription start sites (TSS) - again, available from UCSC I believe
  5. In silico prediction: Use one of those functional / pathogenicity prediction tools. There have been many tools released in recent years, including ones tailored for cancer and somatic mutations. Take a quick look here: A: pathogenicity predictors of cancer mutations

Noe that, technically, you could introduce all of the data from points 2-5 into the model mentioned in point 1. This would then be a robust way to assess the role of each mutation in relation to gene expression.

Finally, thinking just about the RNA-seq data, you could deconvolute it in order to identify immune cell-types that may be present in the tumour. This would give you an indication of the amount of immune cell infiltration, which is likely to differ across your tumors.

There are yet more ideas that I have not mentioned.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you Kevin for the detailed answer. Thats so amazing :) Good to learn about deconvoluting RNAseq data, this is interesting, I need to learn that :)

ADD REPLY
1
Entering edit mode

Thanks I actually need to follow my own advice and do these things on my data, too :)

ADD REPLY

Login before adding your answer.

Traffic: 2578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6