Question

Gene regulatory networks from RNA-seq data

2

Entering edit mode

10.1 years ago

Diana ▴ 910

Hello everyone,

I am currently working on reverse engineering Gene Regulatory Networks from RNA-seq data. I have data from different stages of heart development. My question is if you have different transcripts from the same gene in the RNA-seq data (which is ofcourse unavoidable), should one use information from both transcripts to build the network? because obviously the network is at the gene level and therefore using transcript-level information can be tricky because the transcripts will have different expression values. How will this affect the overall network? plus won't the network inference programs get confused as there will be 2 entries with the same gene name but different values? One will have to make the gene names unique and so these will become 2 different nodes in the network. In such a case, is it OK to use this kind of information as people do build networks from RNA-seq data? Should one keep 1 transcript per gene? but how to select one transcript and on what basis?

Any thoughts?

Thanks!!

RNA-Seq • 6.2k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 10.1 years ago by Diana ▴ 910

Ram · Accepted Answer · 2014-09-10

First, with RNA Seq data, you can always choose to use the Gene level information or the Transcript level information. Personally I prefer using the gene level information (counting the coverage per gene using tools such as HTSeq) as they are much easier to interpret and with less noise.

Now with the network analysis. I have only used WGCNA for de novo network analysis and it relies heavily on the correlation matrix. So it can either be gene - gene correlation matrix or transcript - transcript correlation matrix. But if you provide the transcript information, you are building the transcription co-expression network which is slightly different.

So, basically, you can give each transcript an unique identifier and perform the network analysis as if it is gene co-expression network analysis or you can just try to obtain the gene level information and perform the gene base analysis.

One important thing to note is that you should keep a very good record as to the data pre-processing procedures that you have taken. As these network co-expression analysis usually relies on clustering, a small change of input e.g. different form of normalization/ different number of genes (transcript) can alter the final result. So if you are choosing to represent a gene by one particular transcript (which I think is very ambiguous and very likely one will not find a gold standard), you should make sure you have a clear rule for replication.

Here is the WGCNA page