Interpreting Bismark Methylation Extractor output
2
0
Entering edit mode
3.8 years ago
Yasin Uzun • 0

Hi,

I am processing DNA methylation data (WGBS). After alignment with bismark aligner, I called methylation sites using bismark methylation extractor. I am confused about the output and could not find the answer in teh documentation.

1. I noticed that bismark deleted OT and OB files before they were empty. I only have CTOT and CTOB files as output. What is the reason for this? I am confused.

2. Bismark generates bedGraph.gz file and cov.gz files. Do they cover just CpG methylation sites? Because they match with CG files in number.

I will appreciate if anybody can comment on this.

sequencing • 6.9k views
0
Entering edit mode

0
Entering edit mode

Yes, look at my last paragraph, I even added an edit shortly after posting it to make things clear.

0
Entering edit mode

I saw it. Thank you very much!

0
Entering edit mode

Also, as an FYI, don't post a comment to your own question in the answer section of this forum. Just add it to the ADD COMMENT section.

2
Entering edit mode
3.8 years ago
lshepard ▴ 470

It all depends on the options you used from Bismark. Here is the relevant information regarding your outputs from the docs:

Strand-specific methylation output files (default): As its default option, the bismark_methylation_extractor will produce a strand-specific output which will use the following abbreviations in the output file name to indicate the strand the alignment came from:

OT – original top strand CTOT – complementary to original top strand OB – original bottom strand CTOB – complementary to original bottom strand Methylation calls from OT and CTOT will be informative for cytosine methylation positions on the original top strand, calls from OB and CTOB will be informative for cytosine methylation positions on the original bottom strand. Please note that specifying the --directional (the default mode) option in the Bismark alignment step will not report any alignments to the CTOT or CTOB strands.

As cytosines can exist in any of three different sequence contexts (CpG, CHG or CHH) the bismark_methylation_extractor default output will consist of 12 individual output files per input file (CpG_OT_..., CpG_CTOT_..., CpG_OB_... etc.).

Context-dependent methylation output files (--comprehensive option): If strand-specific methylation is not of interest, all available methylation information can be pooled into a single context-dependent file (information from any of the four strands will be pooled). This will default to three output files (CpG-context, CHG-context and CHH-context), or result in 2 output files (CpG-context and Non-CpG-context) if --merge_non_CpG was selected (note that this can result in enormous file sizes for the non-CpG output).

Both strand-specific and context-dependent options can be combined with the --merge_non_CpG option.

--pbat

This option may be used for PBAT-Seq libraries (Post-Bisulfite Adapter Tagging; Kobayashi et al., PLoS Genetics, 2012). This is essentially the exact opposite of alignments in 'directional' mode, as it will only launch two alignment threads to the CTOT and CTOB strands instead of the normal OT and OB ones. Use this option only if you are certain that your libraries were constructed following a PBAT protocol (if you don't know what PBAT-Seq is you should not specify this option). The option --pbat works only for FastQ files and uncompressed temporary files

Thus, keep in mind that the use of --directional will have an effect on which strand information is kept. Further, what is likely happening here, is that you used the --pbat option since you are only seeing CTOT and CTOB outputs.

Regarding non CG context, refer to the "Context-dependent methylation output files" / --comprehensive option. There are additional parameters you may consider for non-CG context (EDIT - to complement: unless you specified the option --CX from bismark_methylation_extractor, then your bedGraph should only have CpGs. If you do not specify this, but use --comprehensive, you can generate bedGraphs for the other contexts separately by using the bismark2bedGraph function).

1
Entering edit mode

Use bedGraph2cytosine to associate your coverage file with context:

bedGraph2cytosine -o <output_file> --genome_folder <genome_path> -CX <your_bismark.cov>

0
Entering edit mode
3.8 years ago
Yasin Uzun • 0

I figured it out. By default, cov.gz and bedGraph.gz only includes CpG calls. But it can be changed by --CX option. However, is there any way to get a cov.gz for each of the context types (CpG, CHG, CHH) separately?

0
Entering edit mode

Did you read the comment I left you? I added this information to my original reply shortly after posting to make things easier ..., please read the answer in full if you would like assistance... (and again below)

Regarding non CG context, refer to the "Context-dependent methylation output files" / --comprehensive option. There are additional parameters you may consider for non-CG context (EDIT - to complement: unless you specified the option --CX from bismark_methylation_extractor, then your bedGraph should only have CpGs. If you do not specify this, but use --comprehensive, you can generate bedGraphs for the other contexts separately by using the bismark2bedGraph function).

The answer above also contains the answer to your additional question. And again, please use ADD COMMENT instead of an answer.