Question: Deseq2 : Filtering low counts before per sample
gravatar for Cdk
12 months ago by
Cdk0 wrote:

Good morning everyone,

While I was doing some bibliography, I found the following article, Threshold-seq: a tool for determining the threshold in short RNA-seq datasets. (Bioinformatics. 2017 Jul 1;33(13):2034-2036. doi: 10.1093/bioinformatics/btx073.) which describe a tool that provide how many reads need to support a short RNA molecule in a given dataset before it can be considered different from ‘background.

My question is : can I use this tool to have a number of reads for each sample (lets say a int of 14 reads), pass to zero the numbers that have a number inferior to this int in my count matrix, and provide this count matrix to the DESeq2 functions for differential expression analysis ?

While I understand that DESeq2 expect as input un-normalized counts, my question is : is this kind of filtering affect the internal model of DESeq2 ? If so, may I ask how exactly ?

I have noticed the answer about filtering in other posts, like this one : but I do not really know how to translate them for my question. Especially, the script output a int for every sample, so I am actually quite puzzled about how I could apply this threshold number with Deseq2.

deseq2 threshold-seq • 939 views
ADD COMMENTlink written 12 months ago by Cdk0

Have a look at the DESeq2 manual at the pre-filtering section

ADD REPLYlink written 12 months ago by grant.hovhannisyan1.8k

Thank you for your reply.

But actually this is not something that I can translate directly : here the filtering is done by gene (raw of the counts matrix), while Threshold-seq output a number that could be use on each columns(sample).

ADD REPLYlink written 12 months ago by Cdk0

If you are using DESeq2, then, like Grant, I also recommend following the advice within the DESeq2 tutorial. There is advice for setting thresholds based on both raw and normalised count values.

I also looked at the Threshold-seq manuscript and disagree with it, generally-speaking. For one, they have performed very little benchmarking to real datasets. Second, the documentation is poor. Third, they make the program available as a ZIP file in which there are even hidden MAC system files, lncluding ._.DS_Store. Fourth, the program is neither available on CRAN nor Bioconductor. Finally, I disagree generally with the premise that there exists a 'background' in RNA-seq experiments that is in any way like the background in microarrays. In microarrays, the background is due to fluorescent intensities; in RNA-seq, whilst many transcripts may return very low count values, these may genuinely be real and be reflective of transcriptional 'noise'. Certain experiments may actually want to look at these transcripts. In a 'heightened' transcriptional cellular state (for example, during proliferation), transcriptional noise may be elevated; however, again, these are likely real transcripts but may have no functionality.

ADD REPLYlink written 12 months ago by Kevin Blighe51k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 853 users visited in the last hour