8.0 years ago by
Edit after noticing that this is mainly about differential RNA-seq analysis:
First and foremost, to assess significance you need biological replicates, only replicates grant you with an estimate of variance, this has been treated for example in this question:
Second, I would like to mention that you cannot prove absolutely that a gene is not expressed only because one hasn't found evidence (a non-existance proof is not feasible here).
For computing p-values of differential expression I recommend R packages DEseq or edgeR.
Some of this I have explained in this answer already, there are links to other materials and papers:
However, it is definitely a problem if one gene has very few or zero counts in one or more group and the current methods might not be able to assign p-values properly or at all in these cases.
If I understand you correctly, you want to know if a very small number of reads (say at least one) in an RNA-seq experiment is evidence for the region being transcribed (not necessarily expressed).
Yes, every single sequence and it's alignment is evidence in itself, given the sequencer or protocol doesn't make up sequences! We have to agree on this point: the sequence doesn't lie, but ofc there can be errors.
Of course you would like to have more evidence and so for very lowly covered exons you will have to study them more deeply.
Where could the reads come from:
- They could orginate from a duplicated/highly similar or repetetive region
- They could be poor alignments of reads with many sequencing errors
- The sequences could be contaminations with vectors, adaptors
To prove your gene being transcribed you have to take a look at the individual alignments:
- Filter alignments for duplicate hits to the genome, do you still get coverage
- Look at the single alignments, how good are they, large in-dels?
- apply quality filtering (after removing duplicates, not before)
- look for protocol specific contamination
- look at where in the gene the alignments are: are they all in one locus or do they span exons/ introns?
- re-align the reads against the genome using a more sensitive aligner e.g.(FASTA or SSearch). Do they still align only a single position?
Hope this helps.