Using BBMap for ChIP-seq data
0
0
Entering edit mode
5.3 years ago

Could anyone give me some more information on using BBMap for ChIP-seq datasets? Is it recommended? Specific parameters?

The defaults seem to be suitable for RNA-seq type read mapping, but I can't find absolutely any information on parameter use for ChIP. Should I go back to Bowtie2?

bbmap ChIP-Seq • 1.4k views
0
Entering edit mode

For ChIP-seq, the default parameters are OK, although I recommend reducing maxindel to something much lower and possibly changing the default handling of ambiguously-mapped reads. E.g.

bbmap.sh in=reads.fq out=mapped.sam maxindel=20 ambig=random


0
Entering edit mode

Samples are single-ended, reads are 50bp.

I've currently tried including the following:

bbmap.sh ref=ref in=reads.fq outm=mapped.bam maxindel=1 ambig=random sam=1.3 minid=0.95


I've read somewhere that the default parameters can give spurious reads, and that increasing the minid will likely lead to less false mapping. Does this look okay?

1
Entering edit mode

That looks fine to me. Of course, the more stringent the parameters, the more confident the mappings are. The defaults were originally designed more for RNA-seq and variant-calling, but (apart from adjusting maxindel as needed) I tend to use the defaults for pretty much everything. It's not like allowing lower identity will suddenly make a read move from the correct mapping site to an incorrect one. Rather, it just means that reads that differ more from the reference will get mapped instead of being unmapped. The more a read differs from the reference, the less confidence you can have that the mapping is correct.

0
Entering edit mode

What have you tried so far? Can you post the stats? Defaults should in general be ok for alignments but you could turn maxindel= off if you don't want to look for split alignments.

0
Entering edit mode

Here is the code i've tried so far:

bbmap.sh ref=ref in=reads.fq outm=mapped.bam maxindel=1 ambig=random sam=1.3 minid=0.95


I'm not entirely sure how to read the output statsfile to indicate the percent alignments that mapped to the genome. I have currently set maxindel=1 as I did find that when I was going through the literature.

What do you think about changing minid=? I read somewhere that the default minid leads to spurious mappings.

If it matters, I am attempting to call peaks for both narrow and broad binding domains. (Pol and H3K4me1/K9me3).