Hi there! I'm trying to get some pointers on where to start on the analysis of my dataset. First, some basic info: Background: We performed Nanopore sequencing of two poolseq samples called UU and DD referring to damage vs undamaged. We were able to obtain approximately 3x coverage (this is currently more of a proof of concept than a hardcore association job, so the coverage is unfortunate but worth pursuing). Between these samples, we know that one is going to have different methylation patterns in response to a phenotypic trait response, but we don't know what regions along the genome are going to be differently methylated. The goal is to compare what's methylated (as in higher or lower levels of methylation) between the two samples, knowing there ought to be a lot that cancels out and come up with ideally a short list of significantly different regions we can associate to the plastic response.
Output of initial methylation calling data: Using the raw signal level data against the reference genome and sorted, mapped BAM files the Nanopolish software was used to call methylation. This produces a dataset with the following headers: 1. Chromosome, 2. +/- Strand 3. Start vs. 4. End of the CG dinucleotide (the difference between these is 1 or two bases) 5. The read ID the dinucleotide came from 6. the log likelihood ratio calculated from the model embedded in the methylation calling process using 7. log likelihood that it's methylated and 8. log likelihood that it's unmethylated. 8. number of calling strands (always appears to be 1) 9. number of motifs (don't know about this) and finally 10. the sequence surrounding the dinucleotide, which is always short but variable in length.
I can follow the basic interpretation of the log-likelihood... if it's a positive number, there is statistical support that it's methylated. This would probably be about as much as we need, short of a p-value, if we were looking for a specific region, but we are trying to analyze the differences in de/methylation across the entire methylome. How would you suggest using the statistic values provided in our dataset to compare and visualize the significant different modifications of the dinucleotides between strands/samples?