Question: Genomic Coverage - Samtool's undocumented "depth" verses the poorly documented pileup.
3
gravatar for John
3.9 years ago by
John12k
Germany
John12k wrote:

Hey all :]

I use samtools's depth, and occasionally samtool's pileup commands to calculate coverage of my reads to the genome before binning for coverage. I'm pretty sure everyone else does too ;)

One very common problem is that people find "samtools depth" and "samtools mpileup" don't match up - and it's commonly attributed to filtering poor quality reads, duplicates, etc (mpileup doing more filtering).

But there's a ton of other questions I just can't get the answers too from the sametools docs, namely:

Does depth/pileup count the region between a paired reads, or just the reads themselves?
Does samtools depth/pileup count singletons in paired end sequencing? Does it extrapolate based on average read length to fillout the whole fragment?
If a read maps to multiple locations, is it counted multiple times?

I'm working with ChIP-Seq data, so i might have to correct for peak-shift. What do you guys think?

Thanks! :)

ADD COMMENTlink modified 2.4 years ago by Shicheng Guo4.9k • written 3.9 years ago by John12k
5
gravatar for Devon Ryan
3.9 years ago by
Devon Ryan80k
Freiburg, Germany
Devon Ryan80k wrote:

There comes a time in every bioinformatician's life when he/she needs to just read the source code. This is one of those times.

So, depth actually uses pileup internally, just to confuse things a bit more. Depth does not count regions spliced over (i.e., covered by an N operator) or between reads in a pair, which is good because what's actually going on there would be undefined. I don't think that pileup counts regions between reads, though I guess I'd have to look to be 100% certain.

Neither tool cares whether a read is a singleton or a part of a pair unless you tell them to filter according to that. Samtools is a pretty general program, so it won't do ChIP-seq specific things like extrapolating bounds based on average template length.

In general, you'll need to start going through the code to get answers to questions like these.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Devon Ryan80k

A million thanks Devon for setting the record straight! I'm REALLY surprised that it only counts the reads themselves for the pileup - we have always used this for ALL our ChIP-Seq coverage plots, and I guess it's totally wrong! :O I mean, it depends what you're looking for I suppose - but like you say, it's about time I started writing my own scripts so I really know what's going on. I also work a lot with DNAse-Seq data, and there the 5' read is the most important (as that's where the DNAse actually cuts). I think it's time I made a depth tool for my specific applications. Thank you very much for the advise and clarity :)

ADD REPLYlink written 3.9 years ago by John12k
1

Did you fix this problem? I met this problem too. I find the depth of mpileup is far less than samtools depth? I check it again and again and didn't find the answer. And I find PileOMeth might invoke mpileup and count the methylation and non-methylation reads and then calculated the methylation ratio. I don't use C, neither C++. For these question, what I can do is email the authors. LoL

ADD REPLYlink written 2.4 years ago by Shicheng Guo4.9k

Well Devon is the author so you're half way there :P 

ADD REPLYlink written 2.4 years ago by John12k

As you likely noted in another thread, mpileup is doing some filtering that depth isn't doing. Regarding PileOMeth, it does filtering to, but its defaults are somewhat different from the standard "samtools mpileup", though they work the same way and use the same API. You can play with the -q and -p (these are identical to -q and -Q in samtools mpileup and samtools depth) options to tweak things to your liking, though I should warn that I chose those because I find them to be the most useful.

ADD REPLYlink written 2.4 years ago by Devon Ryan80k
3
gravatar for Pierre Lindenbaum
3.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum108k wrote:

> Does depth/pileup count the region between a paired reads, or just the reads themselves?

samtools depth 'just' runs a multi-mpileup and count the number of bases under each base of the reference. smal l'qlen', deletions and base quality below 'baseQ' are ignored. Here is the core code of 'depth':

    if (!(b->core.flag&BAM_FUNMAP)) {
        if ((int)b->core.qual < aux->min_mapQ) b->core.flag |= BAM_FUNMAP;
        else if (aux->min_len && bam_cigar2qlen(&b->core, bam1_cigar(b)) < aux
->min_len) b->core.flag |= BAM_FUNMAP;

(...)

    while (bam_mplp_auto(mplp, &tid, &pos, n_plp, plp) > 0) { 
        if (pos < beg || pos >= end) continue; // out of range; skip
        if (bed && bed_overlap(bed, h->target_name[tid], pos, pos + 1) == 0) continue; 
        fputs(h->target_name[tid], stdout); printf("\t%d", pos+1); 
        for (i = 0; i < n; ++i) { // base level filters have to go here
            int j, m = 0;
            for (j = 0; j < n_plp[i]; ++j) {
                const bam_pileup1_t *p = plp[i] + j;
                if (p->is_del || p->is_refskip) ++m;
                else if (bam1_qual(p->b)[p->qpos] < baseQ) ++m; // low
            }
            printf("\t%d", n_plp[i] - m); // this the depth to output
        }
    }

 

 

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Pierre Lindenbaum108k

Thank you Pierre for taking the time to find the bit of code responsible.

I must confess, i'm not used to reading C, and I have no idea what a FUNMAP is, hahah, but I can kind of make sense of it.
There should be a flag to switch this behaviour off IMHO: "if (bed && bed_overlap(bed, h->target_name[tid], pos, pos + 1) == 0) continue; " but on the whole it looks very easy to modify this code to do "full fragment" pileups with/without singleton extension. I'll give it a shot :] Thank you so much!

ADD REPLYlink written 3.9 years ago by John12k
1

FUNMAP "FLAG UNMAPPED"

ADD REPLYlink written 3.9 years ago by Pierre Lindenbaum108k
2
gravatar for Shicheng Guo
2.4 years ago by
Shicheng Guo4.9k
Shicheng Guo4.9k wrote:

OK. Now the question is totally clear. samtools mpileup conducted too much filtering, therefore, the depth from the mpileup usually far less than samtools depth

1, mapping qulity (samtools mpileup -q)

2, base qulity (samtools mpileup -Q)

3, discard anomalous read pairs (samtools mpileup -A)

After remove these filtering, the number of depth from these two method will be exactly same. 

ADD COMMENTlink written 2.4 years ago by Shicheng Guo4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1647 users visited in the last hour