Hello everyone,
I am using modkit to analyse the results from Dorado m6A_DRACH methylation base-calling.
(1) I have generated the bedmethyl file from bam file. Now i need a filter criteria for "coverage" and "mod_rate" to get rid of noisy predictions.
Can we directly use the filter on column "Nvalid_cov" as >=20 reads? or do we need to normalise it for per million reads?
(2) for Differential methylation analysis between conditions i am using dmr pair, following command
modkit dmr pair -a c6_r1.bed.gz -a c6_r2.bed.gz -a c6_r3.bed.gz -b dr6_r1.bed.gz -b dr6_r2.bed.gz -b dr6_r3.bed.gz -o dmr_result --ref Genome.fa --base A --threads 96 --log-filepath dmr_result.log
- How does modkit make the unified list of sites from both conditions with replicates?
- How does modkit tools handle the sites which are present in one condition and not in another?
- What kind of test modkit applies to get the DMR sites?
Thanks
Looks like you already opened an issue on
modkitGitHub. That is probably the best place to ask this: https://github.com/nanoporetech/modkit/issues/364If you get an answer there please come back to this thread and post it here.
Hi! Why did you want to make a 20 reads threashold? (Nvalid_cov" as >=20 reads) Are you making any filtering for percent modified ((Nmod / Nvalid_cov) 100) or Nmod?
hello,
(1) The 20 reads threshold was used to filter out sites which are supported by very few reads and may be accounted as noise. So, in order to have a coverage filter i selected normalise coverage instead of raw read coverage. I divided the raw reads by the total number of reads mapped in that particular sample and multiplied by 1million. Then took a cutoff of 2 (reads/million mapped reads). normalisation was done as the samples were different in sizes (no. of reads), so a normalised read count removes any bias for library size.
(2) Yes, i also took a filter cutoff of 5% for a site to be called modified. Means Nmod>=5% (Nmod/Nvalid_cov).