Question: BagFoot TF foot printing
gravatar for rbronste
3.2 years ago by
rbronste370 wrote:

Wondering if anything has been able to successfully run this TF foot printing package outlined in the following paper?

chip-seq bagfootr bagfoot R • 3.9k views
ADD COMMENTlink modified 2.3 years ago by Ashley Stephen Doane20 • written 3.2 years ago by rbronste370

Hi Ricardo,

Did you publish the comparison of TF footprinting methods, particularly related to CENTIPEDE's performance? Or do you happen to know of any other published work on this?

Thank you! Duygu

ADD REPLYlink written 2.6 years ago by ucarduygu0

I've managed part of the way using my own data, but I'm stuck on generating the hexamer table for the hotspot regions. The relevant part of the demo code is:

gfootoption_fed_fasted <- 
    GFootOption(biasfile = "example_data/Hexamer_fed_fasted_merged_withMap.csv",  
        hexamer_pattern= "example_data/nuccode/nuccode_mm9_6mer_{chr}.mm9.dat")

Moreover, I don't see anything in the package that would generate this file: "example_data/Hexamer_fed_fasted_merged_withMap.csv".

Perhaps this is just concatenating the hexamer files, but this would not produce as csv file.

It's a nice paper, and so close to being a useful too...

As far as other tools, I've found HINT to be useful, and it does bias correction for ATAC-seq data. The package authors are very responsive as well.

ADD REPLYlink written 2.3 years ago by Ashley Stephen Doane20

Hi rbronste,
I have managed it to work recently after painful 5 days on debugging, I noticed this post is 15 months ago, So, if you still need some help, please email me at and I will provide you my detailed records :-)
Hope this helps, best wishes!

ADD REPLYlink modified 24 months ago • written 24 months ago by Houyu Zhang20
gravatar for Ricardo A.
2.6 years ago by
Ricardo A.90
University of Michigan Ann Arbor, MI - USA
Ricardo A.90 wrote:

EDIT: I called victory too soon... It took me two full days to figure out why this package would not run on our data. It was the first time I've ever had to really use the debugging functions in RStudio (thanks, Hadley!). After extensive tinkering and innumerous tracebacks, I found that the problem was one the input files generated out of of macs_output2csv.R had a duplicated column. This fixed the package on my end, but I have a lab mate that is trying to use it and currently running into a different issue.

Bottom line: make absolutely sure that your input matches exactly the example files down to the minimum details. Pay specific attention to the .meme files that you want use as input. They must have all the fields in the example files (i.e. alength=, w= , nsites=, and E=) down to the space between the equal sign and the value (this will change depending on your MEME version, so be aware).

I managed to get it working in here (v0.9.6). You need to run the steps in bagfoot_prep_example.R using your own BAM files and then use the output in bagfoot_run_example.R, changing the necessary file paths, of course. The main issue I found is that parallel processing in the run() function used in the run script is broken for unknown reasons, so you have to specify mc.cores=1. Other than that, the necessary scripts seem to working as expected, without having to dig too much into the code.

On a separate note, if you only want to do TF binding predictions on a single sample, you might want to use some other software. My understanding is that BaGFoot is made to provide a framework for comparing footprints between two (or more?) datasets. Gordon Hager's lab already have a footprinting method called DNase2TF, which was written by the same guy (Songjoon Baek). Part of my thesis work is comparing TF binding prediction methods, and so far CENTIPEDE is very solid compared to what's out there - that would be my recommendation. We used it for a couple of recent papers (see Varshney et al, 2017 PNAS, Tomonori & Albanus et al 2018 Sci Reports).

Hope this helps.


ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Ricardo A.90


How do you get all the meme files for runFIMOandMakeMotifList.R?

ADD REPLYlink written 2.4 years ago by Shuang He0

It looks as though the author posted meme files on sourceforge, but you can also do something like the following.

I used a motifDB object that contains the homer motifs. This is available in the motifbreakR package. Then, I export motifs as meme format, using motifDB.

meme.text = MotifDb::export(homer, "", format= 'meme')

Then for bagfoot:

 motifDBhomer_bagfoot = MotifDB(motiflistfile = "") 
# use motifDB=motifDBhomer_bagfoot as argument in GFoot function call.

Good luck!!!

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Ashley Stephen Doane20

Hi guys,

thank you very much for the meme motif solution, Ricardo! I am very close to run bagfoot... there is still on step I can not get through: how to merge the 2 Hexamer bias frequency tables (that I already produced)? Do I have to calculate a mean of "correctionFactor"? Add up some numbers? I am completly newbie to ATACseq. Thanks a lot in advance,


[UPDATE] In case someone is interested, I think I found the answer: I just add the observed cuts in the 2 files (2nd column), and I recalculated the different ratio (found in bagfoot.R code)

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by mathilde.perez0

Hi Mathilde,

Could you please provide more details on how to concatenate the two Hexamer tables? I'm still confused. What's more, do you find any difference when generating the hexamer bias frequency table with or without mappability? Thanks a lot in advance!


ADD REPLYlink written 2.2 years ago by xuzihan09100

Hi Zihan,

I generated the hexamer tables using the mappability only. The output of bagfoot is nice and makes sense with the ATACseq data I used. Regarding the merging of the tables, basically for each hexamer you have to add up the observed cuts (2nd column) and then recalculate the ratio. Be careful to not use the hexamer "other" when you calculate the total of observed cuts and the total of genomic positioin count. Here a peace of my code to merge the Hexamer tables (in Java), let me know if you have question

double genPosSum=0;
    double cutSum=0;
    for (String hexamer: hexamerToGenomicPosCount.keySet()){
        if (!hexamer.contains("other")) {
            cutSum += hexamerToObCuts1.get(hexamer) + hexamerToObCuts2.get(hexamer);
            genPosSum +=hexamerToGenomicPosCount.get(hexamer);

    BufferedWriter out = new BufferedWriter(new FileWriter(outFile));
    for (String hexamer: hexamerToGenomicPosCount.keySet()){
        int sum =hexamerToObCuts1.get(hexamer) + hexamerToObCuts2.get(hexamer);
        out.write(" "+sum+" "+hexamerToGenomicPosCount.get(hexamer));
        double ObCutRatio = (double) sum / cutSum;
        double GPRatio = (double)hexamerToGenomicPosCount.get(hexamer)/genPosSum;
        double corrFactor=GPRatio/ObCutRatio;
        if (!hexamer.contains("other")) {
            out.write(" " + String.format("%.12f",ObCutRatio) + " " + String.format("%.12f",GPRatio) + " " + String.format("%.12f",corrFactor) +;
            out.write(" NA NA 1";

best, Mathilde

ADD REPLYlink written 2.2 years ago by mathilde.perez0

Hi Mathilde,

Thanks a lot for the quick reply and code. One more question, I'm not sure where to download the mappability file that ends with .out format as mentioned in the bagfoot_prep_example.R. The only website that I found is this:, but it comes from a different package called PeakSeq. Should I generate the .out files by myself? If so, how to do that? Thank you so much!


ADD REPLYlink written 2.2 years ago by xuzihan09100

Hi Zihan,

yes you have to make them, it took me one entire week on my server... I did hg19. Here some tips to make them:

MAPPABILITY files: - download all .fa files for each chr (hg19 genome): - download mappability scripts: -> Make the three executables chr2hash, oligoFindPLFFile, mergeOligoCounts, using the Makefile -> In the directory containing the fa files, do: python 35 (35 for 35mer) * VERY LONG: 1 WEEK! **

I have one question for you, now I would like to use a more recent motif database. I got the meme files, but I dont know how to get/generate the related meme_index.file. Would you know by chance or did you use only the meme files provided by Bagfoot? Thanks! Mathilde

ADD REPLYlink written 2.2 years ago by mathilde.perez0

Hi Mathilde,

Thanks for the tips, very useful. For the meme files, I download it from JASPAR: This is the example for mouse, you can do the same for human genome. At the bottom left of this page, you can see a csv button, I download this csv file as my index file. I guess you just need to change the row names of this csv file as the meme_index files that the author provided. Hope this helps.


ADD REPLYlink written 2.2 years ago by xuzihan09100

Thanks a lot Zihan! Since I want to use other DBs, I will have to make my own script to make this file. That's OK, I will just skip the description fields for now. All the best, Mathilde

ADD REPLYlink written 2.2 years ago by mathilde.perez0

Hi Mathilde,
Does your index file function normally if you just skip the description field?

ADD REPLYlink written 24 months ago by Houyu Zhang20

Hi Hughie, yes it does!

ADD REPLYlink written 24 months ago by mathilde.perez0

Thanks for reply´╝îMathilde´╝î
I exported meme file from MotifDb of JASPAR2018 and I downloaded index file from JASPAR website, the motif number and rank is a little different and I failed some of the motifs (Actually I indeed got some Fimo results) when runFIMOandMakeMotifList.R. So, I want to ask whether you re-ranked your index file to coordinate with your meme file? Thank you!

ADD REPLYlink modified 24 months ago • written 24 months ago by Houyu Zhang20

I have re-ranked my index file to fit .meme file and got normal result, Thank you!

ADD REPLYlink written 24 months ago by Houyu Zhang20

First of all many thanks to Mathilde, Zihan and other contributors to this really helpful discussion, which I essentially used as a manual for using BaGFoot.

I have a follow-up question. Did anyone figure out how to calculate p- and q-values for each TF as the authors did in Fig 3D ( and other figures? I see a text on that in the supplementary material, but the complex statistics that they have used is hard for me to understand. I wonder if a piece of code for calculating p- and q-values and plotting them on Bagplot already exists and if anyone is willing to share that!

I would highly appreciate any help. Thank you.

Best wishes, Ravi

ADD REPLYlink written 19 months ago by Ravi Patel0

Hi Ravi,
I have sucessfully used this package before and I may give you some suggestions.
The default result of bagfoot is without p-value / q-value, however, the code for statistical analysis exists in original bagfoot code. Just use gen_bagplot_chisq() instead of gen_bagplot() .......

ADD REPLYlink modified 19 months ago • written 19 months ago by Houyu Zhang20

Perfect! Thanks Hughie!

ADD REPLYlink written 18 months ago by Ravi Patel0
gravatar for Ian
3.2 years ago by
University of Manchester, UK
Ian5.7k wrote:

I spent about a week (off and on) and gave up. A colleague of mine has apparently managed to get usable output; after tweaking the R code. I don't like saying bad things about peoples efforts, but the documentation was atrocious.

EDITED (March 2019):

Possible alternative:


ADD COMMENTlink modified 22 months ago • written 3.2 years ago by Ian5.7k
gravatar for simon.vanheeringen
3.2 years ago by
simon.vanheeringen200 wrote:

Not sure how well it compares, but I found to be very straightforward to run.

ADD COMMENTlink written 3.2 years ago by simon.vanheeringen200

Yes Ive seen this one before however I think its more tuned towards DNase-seq data and I am working with ATAC-seq.

ADD REPLYlink written 3.2 years ago by rbronste370

It does have support for ATAC-seq last I checked. Maybe not in the official release, might be in a branch in the github repo.

ADD REPLYlink written 3.2 years ago by simon.vanheeringen200

As Simon said it does have support for ATAC-seq.

ADD REPLYlink written 3.2 years ago by Ian5.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1048 users visited in the last hour