Dear all,
I am wondering is there any packages for I am wondering is there non parametric test for rna seq count data. Google gave me 2 suggestion: 1) NPEBseq but i could not found this packages anywhere bioinformatics.wistar.upenn.edu/NPEBseq ## dead link 2) LFCseq found but dont there is lacking of vignettes. And dont know how to use it properly.
Any help would be appreciated Thanks
Why do you want to do non parametric tests with RNAseq data? There are very powerful packages for RNAseq data analysis such as edgeR and limma voom, why are those not suitable for you?
I already tested edgeR but its not suited when we have outliers. It falsely discovered some genes (FDR <0.05) which were outliers (even with low counts with one outlier in a sample). So i thought to test with some non-parametric test to control the outliers. The alternative option could be DESeq2, as this package is able to control the outliers but it apply only when we have large dataset. ?DESeq minReplicatesForReplace = 7 (default) minReplicatesForReplace the minimum number of replicates required in order to use replaceOutliers on a sample. If there are samples with so many replicates, the model will be refit after these replacing outliers, flagged by Cook's distance. Set to Inf in order to never replace outliers. And, I have only 3 by 3 sample in each condition.
If you only have 3 samples in each condition, how sure are you that we are talking about outliers?
Do you know what the (technical) cause is for your outliers? Or can it be a biological cause?
What you think about these genes detected by edgeR.
ID C1 C2 C3 T1 T2 T3 logFC PValue FDR
ENSDARG00000016825 44 29 41 51 53 33803 8.217649637 9.29E-07 0.000719466
ENSDARG00000092233 100 73 80 63 43 228889 9.819161124 2.50E-06 0.001390657
ENSDARG00000055809 245 191 215 206 274 78772 6.92656458 5.54E-06 0.00235486
ENSDARG00000078429 196 76 160 174 146 36360 6.409972189 7.06E-06 0.002576343
ENSDARG00000023151 62 14 22 29 39 1064 3.523178845 0.000428103 0.026348958
ENSDARG00000020084 116 203 122 2107 84 92 2.37419445 0.004750923 0.100555503
When you say outliers as false genes that might not be a DEGs you say it because only one sample might be expressed in one condition and still gives you as a DEG right? If that is the cause you can always plot the distribution of your count data for each sample and then do a filtering of discarding low read counts and consider anything above first quantile for each sample, also there is something people employ where you can simply remove
rowsums
of read counts based below certain values depending upon the distribution. If you know consideration of gene as expressed is often seen as 0.2 FPKM so you can alternatively filter those genes for which FPKM is below that , so you might have to check what is the read counts for those and discard them. This might give you less number of genes expressed in your system but the DEGs call might be more fruitful and true . Have you tried that?First of all, I cant use FPKM as i have raw counts instead i used CPM to filter low expressed genes from the beginning (advised by the edgeR itself).
Keeping tags that have Cutoff (Counts per million) = 1, in at-least half (50%) of the Total sample length.
keep <- rowSums(cpm(DF)> as.numeric(1)) >= 3 ##
data_filt <- DF[keep, ]
Then i have performed DE analysis using GLM model.
Ah ok that is fine, no am not asking to you FPKM for differential analysis, I am trying to say to filter out low expressed genes , so for samples having low read counts which you have already done for the normalized count as I see.