I would like to show how to use a new program to identify differentially methylated regions (DMRs) from bisulfite sequencing data, whether whole-genome or RRBS. This tutorial won't be very long, as the program is remarkably easy to use, and doesn't require learning a programming language or writing complex scripts as many other DMR packages do. It is able to run whole-genome bisulfite data even on older laptops in under 2 minutes as a single command. The program, Defiant, or "Differential methylation: Easy, Fast, Identification and ANnoTation" not only identifies differentially methylated regions quickly and easily, but also provides annotations and professional, publication-quality images of DMRs as options. More information is available from our recent publication: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2037-1
Defiant was designed to use as few computational resources as possible, and to even negate the necessity of paying for supercomputer/cluster resources.
Defiant automatically identifies input file type from bs_seeker and Bismark. Defiant can read the same input data that Metilene, MethylKit, and BSmooth use. This is meant to take as much work off of the end user as possible, so you don't need to worry about any of that with Defiant.
Downloading and Installation
The program can be downloaded from github: https://github.com/hhg7/defiant and installed with a single shell script:
install.sh All you need is an installation of
gcc available, which should be on any Linux machine anyway. However, if you don't have
gcc installed, simply type
sudo apt-get install gcc or ask your superuser/sysadmin to do so for you.
Running defiant is extremely simple
for the simplest case,
./defiant -i control1.txt,control2.txt case2.txt,case2.txt
but defiant can automatically annotate the DMRs for you if you wish with the
./defiant -a refFlat.txt -i control1.txt,control2.txt case2.txt,case2.txt
Defiant can read data from both refFlat and Gencode GTF formats. You do not need to specify which version you're using, as defiant automatically identifies which format you're using.
Defiant can generate publication-quality images of the DMRs (this requires an installation of
R) with a simple command-line option
-xwith your desired x-axis label, e.g.
./defiant -x CpG -a refFlat.txt -i control1.txt,control2.txt case2.txt,case2.txt
Defiant's output is very simple and easy to read (best viewed in an spreadsheet program like Excel). The '-b' option prints bed output.
Chromosome Start End #mCpN #Diff.CpN Mean_Difference control FeDef Inside_Genes Between_Genes Gene_Promoter_Cutoff_10000_Nucleotides 1 2984116 2984175 5 3 -54.1 [68.2,87.8,94.1] [35.5,3.5,66.7] Ust n/a Ust 1 3283278 3283785 8 2 26.3 [85.1,70.7,62.0] [98.1,98.7,95.2] n/a Ust-125047,985340-Samd5 n/a 1 3332817 3332836 5 4 49.0 [47.1,30.0,32.4] [99.2,79.0,41.3] n/a Ust-174586,936289-Samd5 n/a
Defiant can also print out adjusted p-values for each DMR with the
-v option if desired.
That's it! More detail is available in the publication mentioned above. We go through excruciating detail comparing out new program to existing DMR packages such as Metilene, MethylKit, MethylSig, RnBeads, and RADMeth, and show that Defiant beats all of them in identifying DMRs quickly and easily. Please let me know if you have any questions.