Entering edit mode
4.7 years ago
demoraesdiogo2017
▴
110
Hello I am running a few bioinformatics analysis on free data sets, and then we will validate it in vivo. To increase our chances of finding something relevant, we decided to use the following criteria:
- the transcript is diferentially expressed in at least one group, compared to control group (FC>2, FDR<0.05)
- the transcript has an overall high expression on the samples
- the transcript is part of a coexpression network
- the transcript has a high network centrality
my main concern is on the criteria 2. the dataset has 12 groups and about 300 samples. How should I determine if the transcript has a high overall expression? my first thought was to use a geometric or arithmetic average or median, but I am skeptical about it.
What are you actually working on? Some background would help. So essentially the question is how to find out the average expression of a transcript (or gene)?
I'm using the rnaseq data from GTEX essentially I want to look only at genes that have a high basal expression, but I'm not sure how to do it. I considered doing what I usually do for edgeR, which is cutting off genes with an expression <x on="" 25%="" o="" the="" samples<="" p="">
I would first of all aggregate transcripts to the gene level with
tximport
and then feed this into edgeR. tximport manual covers how to do that properly. Then I'd go foraveLogCPM
in edgeR for the average normalized expression.I'm not sure what your data looks like, but if you're using RNA-Seq data to quantify transcript levels, maybe tximport is something you can use.