Question

Any R and bioconductor packages for non parametric test for rna seq count data.

0

Entering edit mode

8.5 years ago

unique379 ▴ 120

Dear all,

I am wondering is there any packages for I am wondering is there non parametric test for rna seq count data. Google gave me 2 suggestion: 1) NPEBseq but i could not found this packages anywhere bioinformatics.wistar.upenn.edu/NPEBseq ## dead link 2) LFCseq found but dont there is lacking of vignettes. And dont know how to use it properly.

Any help would be appreciated Thanks

RNA-Seq R bioconductor • 5.8k views

ADD COMMENT • link updated 8.5 years ago by mkulecka ▴ 360 • written 8.5 years ago by unique379 ▴ 120

1

Entering edit mode

Why do you want to do non parametric tests with RNAseq data? There are very powerful packages for RNAseq data analysis such as edgeR and limma voom, why are those not suitable for you?

ADD REPLY • link 8.5 years ago by Benn 8.3k

0

Entering edit mode

I already tested edgeR but its not suited when we have outliers. It falsely discovered some genes (FDR <0.05) which were outliers (even with low counts with one outlier in a sample). So i thought to test with some non-parametric test to control the outliers. The alternative option could be DESeq2, as this package is able to control the outliers but it apply only when we have large dataset. ?DESeq minReplicatesForReplace = 7 (default) minReplicatesForReplace the minimum number of replicates required in order to use replaceOutliers on a sample. If there are samples with so many replicates, the model will be refit after these replacing outliers, flagged by Cook's distance. Set to Inf in order to never replace outliers. And, I have only 3 by 3 sample in each condition.

ADD REPLY • link 8.5 years ago by unique379 ▴ 120

0

Entering edit mode

If you only have 3 samples in each condition, how sure are you that we are talking about outliers?

Do you know what the (technical) cause is for your outliers? Or can it be a biological cause?

ADD REPLY • link 8.5 years ago by Benn 8.3k

0

Entering edit mode

What you think about these genes detected by edgeR.

ID C1 C2 C3 T1 T2 T3 logFC PValue FDR

ENSDARG00000016825 44 29 41 51 53 33803 8.217649637 9.29E-07 0.000719466

ENSDARG00000092233 100 73 80 63 43 228889 9.819161124 2.50E-06 0.001390657

ENSDARG00000055809 245 191 215 206 274 78772 6.92656458 5.54E-06 0.00235486

ENSDARG00000078429 196 76 160 174 146 36360 6.409972189 7.06E-06 0.002576343

ENSDARG00000023151 62 14 22 29 39 1064 3.523178845 0.000428103 0.026348958

ENSDARG00000020084 116 203 122 2107 84 92 2.37419445 0.004750923 0.100555503

ADD REPLY • link 8.5 years ago by unique379 ▴ 120

0

Entering edit mode

When you say outliers as false genes that might not be a DEGs you say it because only one sample might be expressed in one condition and still gives you as a DEG right? If that is the cause you can always plot the distribution of your count data for each sample and then do a filtering of discarding low read counts and consider anything above first quantile for each sample, also there is something people employ where you can simply remove rowsums of read counts based below certain values depending upon the distribution. If you know consideration of gene as expressed is often seen as 0.2 FPKM so you can alternatively filter those genes for which FPKM is below that , so you might have to check what is the read counts for those and discard them. This might give you less number of genes expressed in your system but the DEGs call might be more fruitful and true . Have you tried that?

ADD REPLY • link 8.5 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

First of all, I cant use FPKM as i have raw counts instead i used CPM to filter low expressed genes from the beginning (advised by the edgeR itself).

Keeping tags that have Cutoff (Counts per million) = 1, in at-least half (50%) of the Total sample length.

keep <- rowSums(cpm(DF)> as.numeric(1)) >= 3 ##

data_filt <- DF[keep, ]

Then i have performed DE analysis using GLM model.

ADD REPLY • link 8.5 years ago by unique379 ▴ 120

0

Entering edit mode

Ah ok that is fine, no am not asking to you FPKM for differential analysis, I am trying to say to filter out low expressed genes , so for samples having low read counts which you have already done for the normalized count as I see.

ADD REPLY • link 8.5 years ago by ivivek_ngs ★ 5.2k

score 1 · Answer 1 · 2016-04-28

1

Entering edit mode

8.5 years ago

WouterDeCoster 47k

You may want to have a look at NOISeq: https://www.bioconductor.org/packages/release/bioc/html/NOISeq.html

ADD COMMENT • link 8.5 years ago by WouterDeCoster 47k

score 0 · Answer 2 · 2016-04-28

0

Entering edit mode

8.5 years ago

mkulecka ▴ 360

SAMSeq is also implemented in R: http://www.inside-r.org/packages/cran/samr/docs/SAMseq.

On a side note: I believe that I read somewhere that you need at least 5 replicates for non paramteric methods to work effectively in RNASeq.

ADD COMMENT • link 8.5 years ago by mkulecka ▴ 360