DNA Methylation count
3
0
Entering edit mode
3 months ago

I recently conducted Whole Genome Bisulfite Sequencing (WGBS) and utilized the Bismark pipeline, generating files like file.cov and file.bedgraph. While these files contain valuable methylation data at specific genomic positions, I'm currently faced with the task of extracting gene-level insights. Specifically, I want to understand the methylation counts for individual genes, such as having information like "gene1: 34 methylations, 21 unmethylations." If anyone has experience or suggestions on how to effectively derive gene-specific methylation information from WGBS data, I would greatly appreciate your insights and guidance. Thank you for your help!

Methylation • 576 views
ADD COMMENT
0
Entering edit mode

I dont want to do differential analysis . i have data like this

Chromosom  start  end  coverage methylated_count unmethylated_count
Ecoli_K12   91  91  100 1   0
Ecoli_K12   153 153 100 1   0
Ecoli_K12   244 244 0   0   1
Ecoli_K12   249 249 100 1   0
Ecoli_K12   257 257 100 1   0
Ecoli_K12   259 259 100 1   0
Ecoli_K12   284 284 100 1   0
Ecoli_K12   298 298 100 1   0
Ecoli_K12   312 312 0   0   1
Ecoli_K12   1645    1645    100 1   0

I want something like this

Gene_id  start end   methylated_count
gene 1     50     200    2

I tried this code

# Example DNA methylation data (CpG sites in BED format)
methylation_data <- read.table("methylation_data.bed", header = FALSE, col.names = c("chrom", "start", "end", "methylation_level"))

# Example gene annotations (in BED format)
gene_annotations <- read.table("gene_annotations.bed", header = FALSE, col.names = c("chrom", "start", "end", "gene_id"))

# Create GRanges objects for methylation data and gene annotations
gr_methylation <- GRanges(seqnames = methylation_data$chrom, ranges = IRanges(start = methylation_data$start, end =

methylation_data$end)) gr_genes <- GRanges(seqnames = gene_annotations$chrom, ranges = IRanges(start = gene_annotations$start, end = gene_annotations$end), id = gene_annotations$gene_id)

# Match CpG sites to genes
hits <- findOverlaps(gr_methylation, gr_genes)

# Extract methylation information for CpG sites within genes
methylation_within_genes <- methylation_data[hits]

# Summarize methylated counts for each gene
methylated_counts <- tapply(methylation_within_genes$methylation_level,

methylation_within_genes$gene_id, sum, na.rm = TRUE)

# Display the results
print(methylated_counts)

but not getting the result is there anywhere I'm doing mistake?

ADD REPLY
1
Entering edit mode
3 months ago
dthorbur ★ 1.9k

Numerous papers and reviews come up when you type differential methylation tools into google scholar. This would be the best place to identify suitable tools given your data, and what are some accepted thresholds to use in filtering. Personally, I've used methylkit, but there are plenty more out there and it has been a few years.

If you just want to overlap genomic features with your data, you could use gene/transcript/exon coordinates in a GFF/GTF annotation file and overlap them with the Bismark output. R packages like the GenomicRanges package in R, but also data.table has some useful functions like foverlaps. Otherwise a basic grasp of python would also allow you to conduct your analysis.

ADD COMMENT
0
Entering edit mode

Thank you so much

ADD REPLY
1
Entering edit mode
3 months ago
Binte.farooq ▴ 20

you can use HMST-Seq-Analyzer for this purpose (https://hmst-seq.github.io/hmst/) Look into the demo for how to install and run code.

this tool will give you a list of DMRs ( differentially methylated regions) associated to all the genes in human genome (im assuming you are working on human genome).

You will get a detailed insight of gene-specific methylation. for example, DMRs associated to transcription start site, end site and DMRs in gene body.

Also, if you look at the output ( im copying here just 1 DMR predicted by the tool), you can extract more information.

chrNo:DMR_start:DMR_end: gene_name: coordinates of Methylated Cs found in case VS control

chr1:869070:875802:-:NR_027055||203:0||gene:1000:1000||FAM41C:-:868070:876802||869378, 869379, 869531, 869532, 869564, 869565, 869612, 869618, 869619, 869722, 869931, 869933, 869937, 869939, 869946, 869957, 869960, 869971, 869972, 869996, 869997, 870001, 870002, 870019, 870020, 870022, 870023, 870030, 870031, 870047, 870048, 870053, 870054, 870055, 870056, 870063, 870064, 870068, 870069, 870086, 870087, 870093, 870094, 870096, 870097, 870098, 870099, 870103, 870104, 870105, 870106, 870117, 870118, 870119, 870120, 870138, 870139, 870141, 870142, 870157, 870158, 870160, 870161, 870166, 870167, 870173, 870174, 870192, 870193, 870223, 870224, 870225, 870226, 870235, 870236, 870246, 870247, 870365, 870366, 870405, 870406, 870561, 870562, 870670, 870671, 870739, 870740, 870800, 870801, 870850, 870851, 871140, 871141, 872028, 872029, 872322, 872323, 872564, 872565, 872656, 872657, 872667, 872668, 872866, 872867, 872969, 873076, 873087, 873146, 873181, 873182, 873203, 873204, 873249, 873250, 873261, 873262, 873343, 873344, 873367, 873368, 873479, 873480, 873517, 873518, 873528, 873529, 873540, 873541, 873547, 873548, 873565, 873566, 873625, 873626, 873630, 873631, 873726, 873727, 873735, 873742, 873755, 873765, 873776, 873790, 873798, 873817, 873818, 873857, 873858, 873920, 873921, 873929, 873934, 873938, 873948, 873951, 874069, 874070, 874088, 874089, 874128, 874129, 874151, 874152, 874184, 874185, 874209, 874210, 874213, 874214, 874240, 874241, 874246, 874247, 874261, 874262, 874299, 874300, 874382, 874546, 874647, 874648, 874714, 874715, 874717, 875025, 875026, 875089, 875090, 875110, 875218, 875222, 875504, 875505, 875508, 875509, 875644, 875645, 875690, 875691, 875785, 875786||chr1_vs_chr1||chr1:869070:875802:-:NR_027055||264:0||gene:1000:1000||FAM41C:-:868070:876802||869378, 869379, 869531, 869532, 869564, 869565, 869611, 869612, 869618, 869619, 869721, 869722, 869777, 869818, 869819, 869820, 869821, 869833, 869834, 869847, 869848, 869849, 869850, 869863, 869864, 869865, 869866, 869871, 869872, 869892, 869893, 869900, 869901, 869902, 869903, 869908, 869909, 869914, 869915, 869931, 869932, 869933, 869934, 869937, 869938, 869939, 869940, 869946, 869947, 869957, 869958, 869960, 869961, 869971, 869972, 869996, 869997, 870001, 870002, 870019, 870020, 870022, 870023, 870030, 870031, 870047, 870048, 870053, 870054, 870055, 870056, 870063, 870064, 870068, 870069, 870086, 870087, 870093, 870094, 870096, 870097, 870098, 870099, 870103, 870104, 870105, 870106, 870117, 870118, 870119, 870120, 870138, 870139, 870141, 870142, 870157, 870158, 870160, 870161, 870166, 870167, 870173, 870174, 870192, 870193, 870223, 870224, 870225, 870226, 870235, 870236, 870246, 870247, 870365, 870366, 870405, 870406, 870561, 870562, 870670, 870671, 870739, 870740, 870800, 870801, 870850, 870851, 871140, 871141, 872028, 872029, 872322, 872323, 872564, 872565, 872656, 872657, 872667, 872668, 872866, 872867, 872969, 872970, 873076, 873077, 873087, 873088, 873146, 873147, 873181, 873182, 873203, 873204, 873249, 873250, 873261, 873262, 873343, 873344, 873367, 873368, 873479, 873480, 873517, 873518, 873528, 873529, 873540, 873541, 873547, 873548, 873565, 873566, 873625, 873626, 873630, 873631, 873726, 873727, 873734, 873735, 873741, 873742, 873754, 873755, 873764, 873765, 873775, 873776, 873789, 873790, 873797, 873798, 873817, 873818, 873857, 873920, 873921, 873929, 873930, 873934, 873935, 873938, 873939, 873948, 873949, 873951, 873952, 873971, 873972, 874069, 874070, 874088, 874089, 874128, 874129, 874151, 874152, 874184, 874185, 874209, 874210, 874213, 874214, 874240, 874241, 874246, 874247, 874261, 874262, 874299, 874300, 874382, 874383, 874546, 874547, 874647, 874648, 874714, 874715, 874717, 874718, 874939, 874940, 875025, 875026, 875089, 875090, 875110, 875111, 875218, 875219, 875222, 875223, 875504, 875505, 875508, 875509, 875644, 875645, 875690, 875691, 875785, 875786||,1.75,1.05,-0.5,5.363929257348585e-05

ADD COMMENT
1
Entering edit mode
3 months ago
Iunona ▴ 10

As it was stated above there are many packages that you can use. methylkit, BiSeq and also for your task you can adapt tools that are originally made for differential expression analysis (DESeq2). I used DESeq2 in one of my projects in DNA methylation analysis.

ADD COMMENT

Login before adding your answer.

Traffic: 1641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6