Question: Can I use DESeq2 for non-coding RNA?
0
gravatar for c_u
6 weeks ago by
c_u200
United States
c_u200 wrote:

I am interested in finding non-coding RNA (lncRNA, eRNA) that are being differentially expressed in the disease case. For genes, doing this is pretty easy with DESeq2. But a collaborator told me that DESeq2 couldn't be used right away for the non-coding transcripts.

Is this true? What are some of the things that I should keep in mind while analyzing non-coding transcripts using DESeq2? He had mentioned that since the amount of ncRNA varies from sample to sample, special care has to be taken to normalize for that. The samples were depleted for ribosomal RNA, but he said that there would still be a lot of rRNA in the samples, and this amount differs from one sample to another.

The data are from total-RNAseq.

noncoding rna-seq deseq2 • 196 views
ADD COMMENTlink modified 5 weeks ago by colin.kern900 • written 6 weeks ago by c_u200
2

he said that there would still be a lot of rRNA in the samples, and this amount differs from one sample to another

Did you check if that was actually the case?

ADD REPLYlink written 5 weeks ago by igor9.9k
1

Even if this is the case, once you remove the rRNA you should be able to perform normalization as usual. I suggest you use MA-plots to explore if the bulk of genes after normalization is centered around y=0 to go in line with the underlying assumptions that the DESeq2 normalization has which is that the median ratio captures the size relationship (quote from here).

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by ATpoint32k

Thanks for asking. I haven't done it yet. I think one brute force method would be to look at the Human GTF and based on the annotation, make a list of RNA genes, and then find their counts using featurecounts. Is there a simpler way?

ADD REPLYlink written 5 weeks ago by c_u200
1

That's probably the simplest, but perhaps not the best. See earlier discussions:

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by igor9.9k
3
gravatar for Kevin Blighe
6 weeks ago by
Kevin Blighe56k
Kevin Blighe56k wrote:

Apart from the fact that you should expect a much higher level of zero-inflation on the raw counts, I don't think that you should worry about anything else initially. So, you will notice many more genes being filtered out based on low count - that's for sure.

The ncRNA profile can differ from sample to sample, and so can the protein coding profile differ.

Kevin

ADD COMMENTlink written 6 weeks ago by Kevin Blighe56k
1
gravatar for colin.kern
5 weeks ago by
colin.kern900
United States
colin.kern900 wrote:

He had mentioned that since the amount of ncRNA varies from sample to sample, special care has to be taken to normalize for that. The samples were depleted for ribosomal RNA, but he said that there would still be a lot of rRNA in the samples, and this amount differs from one sample to another.

This is a major concern when using TPM/FPKM for expression values which are used by tools like Cufflinks and StringTie. However, DESeq2 and edgeR use normalization methods that are intentionally designed to handle this situation. There is some literature that shows edgeR's normalization method may be better than DESeq2's, and I've even seen papers that have normalized their counts with edgeR, exported them, and then identified the DEGs with DESeq2. I don't think there is any problem with using DESeq2's normalization method, though.

ADD COMMENTlink written 5 weeks ago by colin.kern900
1

There is some literature that shows edgeR's normalization method may be better than DESeq2's,

Can you link references?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by ATpoint32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 975 users visited in the last hour