Question: Any R and bioconductor packages for non parametric test for rna seq count data.
0
gravatar for unique379
4.2 years ago by
unique37990
Spain
unique37990 wrote:

Dear all,

I am wondering is there any packages for I am wondering is there non parametric test for rna seq count data. Google gave me 2 suggestion: 1) NPEBseq but i could not found this packages anywhere bioinformatics.wistar.upenn.edu/NPEBseq ## dead link 2) LFCseq found but dont there is lacking of vignettes. And dont know how to use it properly.

Any help would be appreciated Thanks

rna-seq bioconductor R • 1.6k views
ADD COMMENTlink modified 4.2 years ago by mkulecka310 • written 4.2 years ago by unique37990
1

Why do you want to do non parametric tests with RNAseq data? There are very powerful packages for RNAseq data analysis such as edgeR and limma voom, why are those not suitable for you?

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Benn8.0k

I already tested edgeR but its not suited when we have outliers. It falsely discovered some genes (FDR <0.05) which were outliers (even with low counts with one outlier in a sample). So i thought to test with some non-parametric test to control the outliers. The alternative option could be DESeq2, as this package is able to control the outliers but it apply only when we have large dataset. ?DESeq minReplicatesForReplace = 7 (default) minReplicatesForReplace the minimum number of replicates required in order to use replaceOutliers on a sample. If there are samples with so many replicates, the model will be refit after these replacing outliers, flagged by Cook's distance. Set to Inf in order to never replace outliers. And, I have only 3 by 3 sample in each condition.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by unique37990

If you only have 3 samples in each condition, how sure are you that we are talking about outliers?

Do you know what the (technical) cause is for your outliers? Or can it be a biological cause?

ADD REPLYlink written 4.2 years ago by Benn8.0k

What you think about these genes detected by edgeR.

ID C1 C2 C3 T1 T2 T3 logFC PValue FDR

ENSDARG00000016825 44 29 41 51 53 33803 8.217649637 9.29E-07 0.000719466

ENSDARG00000092233 100 73 80 63 43 228889 9.819161124 2.50E-06 0.001390657

ENSDARG00000055809 245 191 215 206 274 78772 6.92656458 5.54E-06 0.00235486

ENSDARG00000078429 196 76 160 174 146 36360 6.409972189 7.06E-06 0.002576343

ENSDARG00000023151 62 14 22 29 39 1064 3.523178845 0.000428103 0.026348958

ENSDARG00000020084 116 203 122 2107 84 92 2.37419445 0.004750923 0.100555503

ADD REPLYlink written 4.2 years ago by unique37990

When you say outliers as false genes that might not be a DEGs you say it because only one sample might be expressed in one condition and still gives you as a DEG right? If that is the cause you can always plot the distribution of your count data for each sample and then do a filtering of discarding low read counts and consider anything above first quantile for each sample, also there is something people employ where you can simply remove rowsums of read counts based below certain values depending upon the distribution. If you know consideration of gene as expressed is often seen as 0.2 FPKM so you can alternatively filter those genes for which FPKM is below that , so you might have to check what is the read counts for those and discard them. This might give you less number of genes expressed in your system but the DEGs call might be more fruitful and true . Have you tried that?

ADD REPLYlink written 4.2 years ago by ivivek_ngs4.9k

First of all, I cant use FPKM as i have raw counts instead i used CPM to filter low expressed genes from the beginning (advised by the edgeR itself).

Keeping tags that have Cutoff (Counts per million) = 1, in at-least half (50%) of the Total sample length.

keep <- rowSums(cpm(DF)> as.numeric(1)) >= 3 ##

data_filt <- DF[keep, ]

Then i have performed DE analysis using GLM model.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by unique37990

Ah ok that is fine, no am not asking to you FPKM for differential analysis, I am trying to say to filter out low expressed genes , so for samples having low read counts which you have already done for the normalized count as I see.

ADD REPLYlink written 4.2 years ago by ivivek_ngs4.9k
1
gravatar for WouterDeCoster
4.2 years ago by
Belgium
WouterDeCoster44k wrote:

You may want to have a look at NOISeq: https://www.bioconductor.org/packages/release/bioc/html/NOISeq.html

ADD COMMENTlink written 4.2 years ago by WouterDeCoster44k
0
gravatar for mkulecka
4.2 years ago by
mkulecka310
European Union
mkulecka310 wrote:

SAMSeq is also implemented in R: http://www.inside-r.org/packages/cran/samr/docs/SAMseq.

On a side note: I believe that I read somewhere that you need at least 5 replicates for non paramteric methods to work effectively in RNASeq.

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by mkulecka310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1492 users visited in the last hour