Question: Looking for transcription evidence in pooled-tissue RNA-seq data
0
gravatar for RT
4.6 years ago by
RT330
European Union
RT330 wrote:

Dear All,

I have a set of genes where I want to check whether these genes are transcribed in different individuals (from the same species ) or not. I have the RNA-seq data where total mRNA was pooled from different tissues (without any replicates). By checking the coverage of different genes from this data, how correctly I will be able to tell whether a gene is transcribed or not? And what further experiments can confirm this? What should be the threshold for no-coverage (like 0 read-coverage or I should be bit relaxed) ?

Thanks, RT

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by RT330
1
gravatar for Devon Ryan
4.6 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

It's probably easiest to make a histogram of each sample's FPKM distribution and just threshold things (you'll probably see two peaks, with the right-most one being "expressed" genes). This won't yield 100% certain results...but then again nothing will. You can get much fancier than this, but I don't really know if it's worth it.

For follow-up, qPCR is pretty common. Note that there's a difference between sub-threshold and not expressed (though this is the case for RNAseq as well). Alternatively, you could just run some Westerns, use a protein array, etc.. None of these are perfect.

Zero coverage genes are that way only because of your sequencing depth. There's enough noise in biology to assume that everything is transcribed at some level in a given cell type (at least if you look at enough cells).

ADD COMMENTlink written 4.6 years ago by Devon Ryan89k

Hi Devon, Thanks a lot for this. I have now got FPKMs for all my samples. Following explains a bit more about my experiment and objective:

I have data from 30 individuals and  a set of 2000 genes, where I am interested to check a). transcriptome evidence of these genes b). core set (out of these 2000 genes) that are showing expression in all the samples. Is it possible to say this on the basis of FPKM threshold? like if I say genes with FPKM below 0.5 are not expressed. If yes, then what would be this threshold? 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by RT330

I feel like I answered a question similar to this a couple days ago but can't find it at the moment. Have a look at a histogram of the FPKMs. If you're lucky, they'll be bimodal, in which case you can set a reasonable threshold (or better yet, fit with two curves and then assign a p-value for the probability of being expressed).

ADD REPLYlink written 4.6 years ago by Devon Ryan89k

Hi Devon. Thanks for the prompt help. I am new to RNA-seq so thought to double check with you. I have attached a figure for one sample. So for this sample can I say that the genes with FPKM values <0.2 should be considered as not expressed (figure shows log2 transformed values). I have already discarded genes with FPKM < 0.5 to get this graph.

 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by RT330

What happens if you just use hist() and specify a higher number of breaks? If you already discarded genes with FPKM<0.5 then it looks like the kernel smoothing is making the density plot harder to interpret.

ADD REPLYlink written 4.6 years ago by Devon Ryan89k

Hi Devon, If I plot histogram of my data then it does not say much. 

 

 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by RT330

Do something like hist(something, breaks=50, ylim=c(0,50))

ADD REPLYlink written 4.6 years ago by Devon Ryan89k

Here is the histogram. Most of the genes have very low FPKM. Is there anything wrong with my dataset? Can you help me with this further:

 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by RT330

Try changing xlim and ylim so you get more than an exponential distribution and then post that.

ADD REPLYlink written 4.6 years ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1789 users visited in the last hour