I want to identify significant differentially methylated regions from a set of DMRs which has DMRs with length from 1 bp to 200 bp, which numbers of bps define a regions in biology? i need a threshold.
Well, pretty much by definition a region will need to be more than one CpG, so 4 bases. In practice, most people use a minimum of 3 or more CpGs, meaning at least 6 bases.
Do you mean DMRs must contains at least 3 CpGs with any length or at Least 6 bp length and contain any desired CpG numbers?
3 CpG with any length (this is obviously never less than 6 bases) is a common usage. Using length as a threshold for definition isn't terribly useful without also setting some sort of CpG number threshold.
and by 3 CpGs you mean they must be consecutive/end to end?
or we can have some distance in between? like 3 differentially methylated Cs but having 2kb distance in between.
I am trying to search some papers as a reference to this ideally 3CpG thing. please let me know if you have any reference.
Thank you :)
One normally allows distance in between, though 2kb would be biologically excessive. You will not find a reference to 3 CpGs in any paper, it's not something people write down.
Any ideal distance you can suggest ?