Find potential important genes from bulk-RNA seq experiment
1
0
Entering edit mode
7 months ago
Chris ▴ 260

Hi Biostars,

After finding DEG genes, I try to find important genes that may cause for a disease so I do pathway analysis with GSEA and find pathway with highest enrichment score using gseKEGG() and gseGO(). Is that genes in the pathway with highest enrichment score is the one I should focus on? Or other analysis I should perform from bulk RNA-seq data to find the important gene? I appreciate your help!

RNA-seq • 926 views
ADD COMMENT
3
Entering edit mode
7 months ago
Michael 54k

Hi Chris,

What is "important" is largely dependent on one's definition and background of the study. Assume for example, that we are talking about a disease that is caused by a variant in a single gene. That would mean, that in principle only this single gene is "important", but it might not even be visible from your DE analysis. Most cases are more complex. Therefore, using enrichment analysis of DE gene is a viable approach to identify candidates. Another is to use network analysis if there multiple samples and identify highly connected hub genes and focus on them.

Another important step of your analysis will be to search the literature for genes and genesets that have been already identified or are associated with the phenotype. Then look for known gene sets that can be retrieved from MSigDB. Finally, integration of different data types, like for example SNPs and proteomics data may also provide deeper insights into the biogenesis of the disease of interest.

Whatever type of signatures you discover that way should be seen as candidates that require further experimental validation.

ADD COMMENT
0
Entering edit mode

Thanks michael so much for your help! We already know the mutation gene which cause the disease and now trying to find which gene/transcription factor is affected by that mutation. Then if possible finding a drug to target that gene/transcription factor because we can't edit the mutation gene. Network analysis like using cytoscape? Multiple samples you mean biological replicates? Thanks Mike!

ADD REPLY
1
Entering edit mode

You can use e.g. the WGCNA package for network analysis. In your case multiple samples this might mean multiple patients or a time series or both.

ADD REPLY
0
Entering edit mode

Hi @michael. I looked for tutorials that do WGCNA analysis and their input data are not very similar to my data. The data tutorial at https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/index.html have 3 files. I have raw count matrix with a control with technical replicate and 3 diseased samples with 2 technical replicates for each. Is that enough to perform the analysis? Thank you!

ADD REPLY
1
Entering edit mode

In summary, you 3 samples plus 1 control, the technical replicates do not help here. That is unfortunately not enough for WGCNA and only barely for DE analysis. If you want to follow this path, you need to either find more patients or use knock-out cell lines.

Another option is to extract upstream sequences of DE genes and extract common motifs with MEME, search transcription factor databases and compare to known TF binding sites in UCSC genome browser. Then identify a set of overlapping TFs between the DE genes.

ADD REPLY
0
Entering edit mode

Thanks michael! So to do WGCNA, what are the minimum of control and diseased? Is there anything I can do to help with your work? I am happy to volunteer.

ADD REPLY
1
Entering edit mode

There is no exact number but something around 15 to 20 samples is what I have read.

Is there anything I can do to help with your work? I am happy to volunteer.

Sure, keep posting :) You can also apply for an internship or a position at the University of Bergen, Norway if you feel like it.

ADD REPLY
0
Entering edit mode

I had been looking a mentor for more than 1.5 years and still no luck. I post questions almost everyday include weekend and the moderator here said it is extremely unproductive, so he advised me to look for a local person. Unfortunately, the local persons I know very busy. I only can ask them a question or two every week. If I post a question every hour, just worry the moderate will think I am spamming.

ADD REPLY

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6