Average Methylation Level for a Genomic Region from Bam File (PileOMeth )
1
4
Entering edit mode
5.7 years ago
Shicheng Guo ★ 8.7k

Hey colleagues,

Is there any convenient method (one step process) to extract average methylation level for a specific genomic region from BAM files of BS-seq or RRBS dataset.

Sure, if the method can extract the information from MethylFreq or some other intermediate files, it would be ok. You know I hope the code or the pipeline should have high speed.

BTW: alignment by Bismark 0.14.0

Thanks.

BS-seq RRBS methylation average level • 2.4k views
4
Entering edit mode
5.7 years ago

This is a two step process.

1) Extract the methylation metrics. PileOMeth can be given a region and then it'll only give results for that (e.g., PileOMeth extract -r chr10:123-456 genome.fa alignments.bam, though the BAM file will need to be sorted).

2) Average. I personally would do a weighted average. So if I used PileOMeth something like this would work on the output file:

grep -v track PileOMeth.output | awk '{cov+=$5+$6;nC+=5}END{print nC/cov}' The grep part is just to remove the track line. This assumes you just want one region. If you want multiple regions at once where each region is summarized separately then you'd need to throw bedtools or bedops at the output of PileOMeth (or whatever else you use) and give PileOMeth a BED file of regions (it's the -l option). ADD COMMENT 0 Entering edit mode Hi Dr Ryan, A little confused on "All coordinates are 0-based half open, which conforms to the bedGraph definition. When paired-end reads are aligned, it can often occur that their alignments overlap. In such cases, PileOMeth will not count both reads of the pair in its output, as doing so would lead to incorrect downstream statistical results" It seems the most popular way is only account for one of the reads? Do you think PileOMeth is also doing like this way or not? ADD REPLY 1 Entering edit mode PileOMeth will only count one of them (if it counts one at all). If mates map and agree on a basecall, then the basecall with the higher phred score is used and the phred score increased slightly (possibly then putting it over the -p threshold). If the reads disagree then the base with the highest phred score is again used but the score is first decreased depending on the phred score in the other read. If you want the full details you can have a look at the function that does this (it's a modified form of what samtools mpileup does): https://github.com/dpryan79/PileOMeth/blob/master/pileup.c#L94 ADD REPLY 0 Entering edit mode Hi Ryan, I am thinking actually I can combine these two script with bash pipe if you can output the extract result to STDOUT. Do you think so? Now the current PileOMeth will output the extract result to file (Default) rather than the STDOUT. I think It would be perfect if we output the result to Screen/STDOUT defaultly. PileOMeth extract -r chr10:123-456 genome.fa alignments.bam | grep -v track PileOMeth.output | awk '{cov+=5+$6;nC+=$5}END{print nC/cov}'

1
Entering edit mode

I've not defaulted to writing to stdout because there can be multiple files. One can always use named pipes for things like this.

0
Entering edit mode

Okay. Deal. Perl, PileOMeth, Bash Again. Bad script, anyway, get the result, Share with all guys.

   #!/usr/bin/perl
use Cwd;
use strict;
my $file="/home/shg047/oasis/DennisLo2015/mhl/Dennis.bed"; open F,$file;
while(<F>){
chomp;
system("PileOMeth extract -p 5 -q 10 --minDepth 1 -r $_ /home/shg047/oasis/db/hg19/hg19.fa BMT1.read1_val_1.fq.gz_bismark_bt2_pe.sort.bam -o Tmp"); my$amf=qx/grep -v track Tmp_CpG.bedGraph | awk '{cov+=\$5+\$6;nC+=\$5}END{print nC\/cov}'/; print "$_\t\$amf";
}

0
Entering edit mode

Hi Dr. Ryan,

I have a number of regions of interest, and have finished the first step using the -l option. Would you give an example code of bedtools or bedops on the output of PileOMeth? I'm very new to this, so I apologize if this is a really naive question! Thank you!