Question

Differential methylation rate analysis using modkit for m6A

2

Entering edit mode

9 months ago

baibhu1234 ▴ 60

Hello everyone,

I am using modkit to analyse the results from Dorado m6A_DRACH methylation base-calling.

(1) I have generated the bedmethyl file from bam file. Now i need a filter criteria for "coverage" and "mod_rate" to get rid of noisy predictions.

Can we directly use the filter on column "Nvalid_cov" as >=20 reads? or do we need to normalise it for per million reads?

(2) for Differential methylation analysis between conditions i am using dmr pair, following command

modkit dmr pair -a c6_r1.bed.gz -a c6_r2.bed.gz -a c6_r3.bed.gz -b dr6_r1.bed.gz -b dr6_r2.bed.gz -b dr6_r3.bed.gz -o dmr_result --ref Genome.fa --base A --threads 96 --log-filepath dmr_result.log

How does modkit make the unified list of sites from both conditions with replicates?
How does modkit tools handle the sites which are present in one condition and not in another?
What kind of test modkit applies to get the DMR sites?

Thanks

m6A modkit Differential-methylation-analysis • 2.7k views

ADD COMMENT • link 8 months ago by baibhu1234 ▴ 60

0

Entering edit mode

Looks like you already opened an issue on modkit GitHub. That is probably the best place to ask this: https://github.com/nanoporetech/modkit/issues/364

If you get an answer there please come back to this thread and post it here.

ADD REPLY • link 9 months ago by GenoMax 154k

0

Entering edit mode

Hi! Why did you want to make a 20 reads threashold? (Nvalid_cov" as >=20 reads) Are you making any filtering for percent modified ((Nmod / Nvalid_cov) 100) or Nmod?

ADD REPLY • link 8 months ago by doramora ▴ 10

1

Entering edit mode

hello,

(1) The 20 reads threshold was used to filter out sites which are supported by very few reads and may be accounted as noise. So, in order to have a coverage filter i selected normalise coverage instead of raw read coverage. I divided the raw reads by the total number of reads mapped in that particular sample and multiplied by 1million. Then took a cutoff of 2 (reads/million mapped reads). normalisation was done as the samples were different in sizes (no. of reads), so a normalised read count removes any bias for library size.

(2) Yes, i also took a filter cutoff of 5% for a site to be called modified. Means Nmod>=5% (Nmod/Nvalid_cov).

ADD REPLY • link 8 months ago by baibhu1234 ▴ 60

score 3 · Accepted Answer · 2025-02-04

This is what i got reply from Modkit developers

"You don't actually need to filter your input data for DMR. The model won't assign a high score or significant p-value to sites with very low coverage. You can find details about the model in the documentation. That being said, you may want to simply ignore positions with low valid coverage so you don't have them in the output, there is a --min-valid-coverage option for that."

"You do not have to perform any normalization, however there are --max-coverages and --cap-coverages options if you have very imbalanced data. With your command, the replicates are matched (meaning you have 3 of each), so you will see the balanced output as well."

"A site must be present in at least 1 replicate from each condition If a site is not present in any of the replicates in one condition, it will not be scored (there's nothing to compare!)."

Again question arises, that if we are only using the sites present in both conditions then what about the sites which are only modified in one condition only, they might be very interesting to see as they have very contrast modification between them.