Question: Gene regulatory networks from RNA-seq data
2
gravatar for Diana
5.2 years ago by
Diana800
Germany
Diana800 wrote:

Hello everyone,

I am currently working on reverse engineering Gene Regulatory Networks from RNA-seq data. I have data from different stages of heart development. My question is if you have different transcripts from the same gene in the RNA-seq data (which is ofcourse unavoidable), should one use information from both transcripts to build the network? because obviously the network is at the gene level and therefore using transcript-level information can be tricky because the transcripts will have different expression values. How will this affect the overall network? plus won't the network inference programs get confused as there will be 2 entries with the same gene name but different values? One will have to make the gene names unique and so these will become 2 different nodes in the network. In such a case, is it OK to use this kind of information as people do build networks from RNA-seq data? Should one keep 1 transcript per gene? but how to select one transcript and on what basis?

Any thoughts?

Thanks!!

rna-seq • 4.7k views
ADD COMMENTlink modified 5.2 years ago by Sam2.5k • written 5.2 years ago by Diana800
4
gravatar for Sam
5.2 years ago by
Sam2.5k
New York
Sam2.5k wrote:

First, with RNA Seq data, you can always choose to use the Gene level information or the Transcript level information. Personally I prefer using  the gene level information (counting the coverage per gene using tools such as HTSeq) as they are much easier to interpret and with less noise. 

Now with the network analysis. I have only used WGCNA for denovo network analysis and it relies heavily on the correlation matrix. So it can either be gene - gene correlation matrix or transcript - transcript correlation matrix. But if you provide the transcript information, you are building the transcription co-expression network which is slightly different. 

So, basically, you can give each transcript an unique identifier and perform the network analysis as if it is gene co-expression network analysis or you can just try to obtain the gene level information and perform the gene base analysis.

One important thing to note is that you should keep a very good record as to the data pre-processing procedures that you have taken. As these network co-expression analysis usually relies on clustering, a small change of input e.g. different form of normalization/ different number of genes (transcript) can alter the final result. So if you are choosing to represent a gene by one particular transcript (which I think is very ambiguous and very likely one will not find a gold standard), you should make sure you have a clear rule for replication. 

Here is the WGCNA page

ADD COMMENTlink written 5.2 years ago by Sam2.5k

For WGNCA, which type of input for RNA-Seq do you suggest ? Normalized read count ? DESeq's varianceStabilizingTransformation ?  RSEM normalized ? Like you said input is very important as the clustering will be different

ADD REPLYlink written 5.2 years ago by Nicolas Rosewick8.5k
1

From the WGCNA FAQ, the authors suggested VST or simply doing Log transformation. They did said that the main goal is to make sure you have the same input. However, I did experience a slightly different answer when I use TPM when compared to VST. That is up to your liking. Use whichever transformation you deems fit and make sure you are consistent with all of the samples.

ADD REPLYlink written 5.2 years ago by Sam2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 857 users visited in the last hour