I want to apply Negative Binomial Distribution to my ribo-seq data simulation process in order to mimic the real data.
The reason of doing this is because I want to compare with the analysis and results of real human ribo-seq data, for my other part of the work.
- a number of RefSeq human transcripts (e.g. the NM_ ) as the source of simulation
- read length distribution from 26bp-32bp (derived from real ribo-seq data)
The real ribo-seq data would have a character that the footprint for transcripts will be different between each sub-codon position and reflect the correct Open Reading Frame. (e.g. http://lapti.ucc.ie/bicoding/Known_frameshift/NM_001172437.png)
I thought the distribution would mainly reflect this.
But I am very confused where to start with, e.g. how to map the distribution model into my case. I wish someone would give me some hints or advises on this, thanks.