Question: Rsamtools::pileup gives inconsistent number of reads
2
gravatar for inesdesantiago
5.0 years ago by
United Kingdom
inesdesantiago170 wrote:

I have created a fasta file with 300 simulated reads at a given SNP position (reads are 75mer, and I simulate every position in the + and - strand, hence 75x4=300 reads). Then, I map them back to the ref genome using bowtie and then use pileup to see the alignment at the position in question:

bf <- BamFile(bamfile)
param <- ScanBamParam(which=GRanges("chr10",IRanges(start=100189138, end=100189138)))
     

quickBamFlagSummary shows that there are 300 reads

> quickBamFlagSummary(bf, param=param)
                                group |    nb of |    nb of | mean / max
                                   of |  records |   unique | records per
                              records | in group |   QNAMEs | unique QNAME
All records........................ A |      300 |      300 |    1 / 1
  o template has single segment.... S |      300 |      300 |    1 / 1
  o template has multiple segments. M |        0 |        0 |   NA / NA
      - first segment.............. F |        0 |        0 |   NA / NA
      - last segment............... L |        0 |        0 |   NA / NA
      - other segment.............. O |        0 |        0 |   NA / NA

Note that (S, M) is a partitioning of A, and (F, L, O) is a partitioning of M.
Indentation reflects this.

Details for group S:
  o record is mapped.............. S1 |      300 |      300 |    1 / 1
      - primary alignment......... S2 |      300 |      300 |    1 / 1
      - secondary alignment....... S3 |        0 |        0 |   NA / NA
  o record is unmapped............ S4 |        0 |        0 |   NA / NA

Pileup only shows 262 reads (63+62+75+62)... Any idea why? I dont use any threshold on min_base_quality and min_mapq...

> pileup(bf, scanBamParam=param, min_base_quality=0,min_mapq=0)
  seqnames       pos strand nucleotide count               which_label
1    chr10 100189138      +          A    63 chr10:100189138-100189138
2    chr10 100189138      -          A    62 chr10:100189138-100189138
3    chr10 100189138      +          G    75 chr10:100189138-100189138
4    chr10 100189138      -          G    62 chr10:100189138-100189138

I looked the region up with IGV, and also, IGV reports 300 reads.

I actually simulate several different positions, and the 262 reads (instead of 300) appear every time.. Not sure why. I am sure pileup is filtering the reads somehow..

Also, the same exact script works fine if I simulate 36mer reads (instead of 75...)! Any ideas why?

pileup rsamtools • 1.9k views
ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by inesdesantiago170
1

Could you post a BAM file just containing this region somewhere for debugging? Also, what's the output of sessionInfo()?

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Devon Ryan94k
1

Actually, note that pileup() takes not only a scanBamParam option, but also a pileupParam option. Given that the latter defaults to some filtering, I wouldn't be surprised if that's the source of the problem.

ADD REPLYlink written 5.0 years ago by Devon Ryan94k

Thank you! it is solved by changing max_depth parameter within PileupParam()

> pileup(bf, scanBamParam=param, pileupParam=PileupParam(max_depth=8000, min_mapq=0, min_base_quality=0))
  seqnames       pos strand nucleotide count               which_label
1    chr10 100189138      +          A    75 chr10:100189138-100189138
2    chr10 100189138      -          A    75 chr10:100189138-100189138
3    chr10 100189138      +          G    75 chr10:100189138-100189138
4    chr10 100189138      -          G    75 chr10:100189138-100189138

I was adding my arguments without PileupParam constructor, this is wrong:

> pileup(bf, scanBamParam=param, min_base_quality=0,min_mapq=0, max_depth=8000)
  seqnames       pos strand nucleotide count               which_label
1    chr10 100189138      +          A    63 chr10:100189138-100189138
2    chr10 100189138      -          A    62 chr10:100189138-100189138
3    chr10 100189138      +          G    75 chr10:100189138-100189138
4    chr10 100189138      -          G    62 chr10:100189138-100189138
ADD REPLYlink written 5.0 years ago by inesdesantiago170

this is what samtools pileup command (from samtools) gives:

samtools mpileup -B -sf $genome.fa -r chr10:100189138-100189138 test_aln75.bam
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
chr10    100189138    A    300    G$.$g$,$G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,G.g,^~G^~.^~g^~,    BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

counting in python shows 150 reads in each allele - the reference ("A") and alternative ("G")  (75 reads in each strand):

>>> a.count("G")
75
>>> a.count("g")
75
>>> a.count(".")
75
>>> a.count(",")
75
ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by inesdesantiago170
1
gravatar for inesdesantiago
5.0 years ago by
United Kingdom
inesdesantiago170 wrote:

Thank you! it is solved by changing max_depth parameter within PileupParam()

> pileup(bf, scanBamParam=param, pileupParam=PileupParam(max_depth=8000, min_mapq=0, min_base_quality=0))
  seqnames       pos strand nucleotide count               which_label
1    chr10 100189138      +          A    75 chr10:100189138-100189138
2    chr10 100189138      -          A    75 chr10:100189138-100189138
3    chr10 100189138      +          G    75 chr10:100189138-100189138
4    chr10 100189138      -          G    75 chr10:100189138-100189138

I was adding my arguments without PileupParam constructor, this is wrong:

> pileup(bf, scanBamParam=param, min_base_quality=0,min_mapq=0, max_depth=8000)
  seqnames       pos strand nucleotide count               which_label
1    chr10 100189138      +          A    63 chr10:100189138-100189138
2    chr10 100189138      -          A    62 chr10:100189138-100189138
3    chr10 100189138      +          G    75 chr10:100189138-100189138
4    chr10 100189138      -          G    62 chr10:100189138-100189138
ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by inesdesantiago170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1109 users visited in the last hour