Hi all,
I performed Nanopore direct RNA-sequencing and called m6A sites. I want to make metaplots of the m6A modification level across the gene body (modification level = modified reads / total reads at a given site).
To do this, I made a bedgraph file that looks like this:
1 5722 5723 0.000
1 5725 5726 35.410
1 5731 5732 0.000
1 5732 5733 19.220
where each position in the file is an A site (that has reproducible coverage) and its m6A modification level. I convert this to bigwig and use deepTools to do further processing.
My question is whether to use the --missingDataAsZero option given the analysis that I am performing?
For example, when running computeMatrix without that option:
computeMatrix scale-regions \
--scoreFileName m6A.bw --regionsFileName annotation.gtf \
--regionBodyLength 1000 --beforeRegionStartLength 500 --afterRegionStartLength 500 \
--binSize 10 --averageTypeBins mean \
--exonID CDS --metagene -o m6A.tab.gz
I get the following plot:
But If I add the --missingDataAsZero
option when running computeMatrix, which to my understanding, sets any missing bin = 0, I get a very different result:
Where there is a ~10-fold reduction in signal, and, importantly, a lack of 5' UTR enrichment. Is this loss of 5' enrichment occurring because my regions file (GTF) has many 5' positions that are not being accounted for in my bigwig file, and are all being set to 0, or for some other reason? The second profile is the more commonly reported m6A distribution, but I really just want to know which approach is more correct/appropriate given my data and whether I've many any oversights.
Thanks so much for reading!