I am writing a program in order to study the coverage of only one sequence. To sum up the pipeline:
- Detect ORFs in the input sequence
- Align all reads on the sequence (bowtie), reads come from RNA-seq
- Count the number of read in each ORF (5' of reads)
- Normalize these counts
Some input sequences have only 6 to 10 ORFs. I want to normalize these counts and I tried DEseq2, which works fine (functionally speaking).
Now, significantly speaking, do you think that evaluate dispersion and normalize counts with DESeq2 for 6 - 10 genes is something valid ? How the adjust P-value will be impacted as few genes are provided for multiple testing.
I would appreciate any comments or suggestions from experienced people with statistics and RNA-seq data normalization.
Thank you !
----- EDIT ------
As the data does not satisfy the assumption mentioned in the C. Yague answers, what kind of count-based normalisation can be applied ? I was thinking about RPKM, but RPKM is more a unit than a normalisation method. Or should I use something like TPM ? And then compute foldchanges from TPM counts ?
Thank you again for your help !