Question: BagFoot TF foot printing
0
gravatar for rbronste
13 months ago by
rbronste200
rbronste200 wrote:

Wondering if anything has been able to successfully run this TF foot printing package outlined in the following paper?

http://www.cell.com/cell-reports/fulltext/S2211-1247(17)30609-5

chip-seq bagfootr bagfoot R • 1.4k views
ADD COMMENTlink modified 9 weeks ago by Ashley Stephen Doane10 • written 13 months ago by rbronste200

Hi Ricardo,

Did you publish the comparison of TF footprinting methods, particularly related to CENTIPEDE's performance? Or do you happen to know of any other published work on this?

Thank you! Duygu

ADD REPLYlink written 5 months ago by ucarduygu0

I've managed part of the way using my own data, but I'm stuck on generating the hexamer table for the hotspot regions. The relevant part of the demo code is:

gfootoption_fed_fasted <- 
    GFootOption(biasfile = "example_data/Hexamer_fed_fasted_merged_withMap.csv",  
        hexamer_pattern= "example_data/nuccode/nuccode_mm9_6mer_{chr}.mm9.dat")

Moreover, I don't see anything in the package that would generate this file: "example_data/Hexamer_fed_fasted_merged_withMap.csv".

Perhaps this is just concatenating the hexamer files, but this would not produce as csv file.

It's a nice paper, and so close to being a useful too...

As far as other tools, I've found HINT to be useful, and it does bias correction for ATAC-seq data. The package authors are very responsive as well.

ADD REPLYlink written 9 weeks ago by Ashley Stephen Doane10
3
gravatar for Ricardo A.
5 months ago by
Ricardo A.80
University of Michigan Ann Arbor, MI - USA
Ricardo A.80 wrote:

EDIT: I called victory too soon... It took me two full days to figure out why this package would not run on our data. It was the first time I've ever had to really use the debugging functions in RStudio (thanks, Hadley!). After extensive tinkering and innumerous tracebacks, I found that the problem was one the input files generated out of of macs_output2csv.R had a duplicated column. This fixed the package on my end, but I have a lab mate that is trying to use it and currently running into a different issue.

Bottom line: make absolutely sure that your input matches exactly the example files down to the minimum details. Pay specific attention to the .meme files that you want use as input. They must have all the fields in the example files (i.e. alength=, w= , nsites=, and E=) down to the space between the equal sign and the value (this will change depending on your MEME version, so be aware).


I managed to get it working in here (v0.9.6). You need to run the steps in bagfoot_prep_example.R using your own BAM files and then use the output in bagfoot_run_example.R, changing the necessary file paths, of course. The main issue I found is that parallel processing in the run() function used in the run script is broken for unknown reasons, so you have to specify mc.cores=1. Other than that, the necessary scripts seem to working as expected, without having to dig too much into the code.

On a separate note, if you only want to do TF binding predictions on a single sample, you might want to use some other software. My understanding is that BaGFoot is made to provide a framework for comparing footprints between two (or more?) datasets. Gordon Hager's lab already have a footprinting method called DNase2TF, which was written by the same guy (Songjoon Baek). Part of my thesis work is comparing TF binding prediction methods, and so far CENTIPEDE is very solid compared to what's out there - that would be my recommendation. We used it for a couple of recent papers (see Varshney et al, 2017 PNAS, Tomonori & Albanus et al 2018 Sci Reports).

Hope this helps.

Cheers

ADD COMMENTlink modified 5 months ago • written 5 months ago by Ricardo A.80

Hi,Ricardo

How do you get all the meme files for runFIMOandMakeMotifList.R?

ADD REPLYlink written 3 months ago by Shuang He0

It looks as though the author posted meme files on sourceforge, but you can also do something like the following.

I used a motifDB object that contains the homer motifs. This is available in the motifbreakR package. Then, I export motifs as meme format, using motifDB.

library(motifbreakR)
data(homer)
meme.text = MotifDb::export(homer, "homer.meme", format= 'meme')

Then for bagfoot:

 motifDBhomer_bagfoot = MotifDB(motiflistfile = "homer.meme") 
# use motifDB=motifDBhomer_bagfoot as argument in GFoot function call.

Good luck!!!

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Ashley Stephen Doane10

Hi guys,

thank you very much for the meme motif solution, Ricardo! I am very close to run bagfoot... there is still on step I can not get through: how to merge the 2 Hexamer bias frequency tables (that I already produced)? Do I have to calculate a mean of "correctionFactor"? Add up some numbers? I am completly newbie to ATACseq. Thanks a lot in advance,

Mathilde

[UPDATE] In case someone is interested, I think I found the answer: I just add the observed cuts in the 2 files (2nd column), and I recalculated the different ratio (found in bagfoot.R code)

ADD REPLYlink modified 24 days ago • written 26 days ago by mathilde.perez0

Hi Mathilde,

Could you please provide more details on how to concatenate the two Hexamer tables? I'm still confused. What's more, do you find any difference when generating the hexamer bias frequency table with or without mappability? Thanks a lot in advance!

Zihan

ADD REPLYlink written 14 days ago by xuzihan09100

Hi Zihan,

I generated the hexamer tables using the mappability only. The output of bagfoot is nice and makes sense with the ATACseq data I used. Regarding the merging of the tables, basically for each hexamer you have to add up the observed cuts (2nd column) and then recalculate the ratio. Be careful to not use the hexamer "other" when you calculate the total of observed cuts and the total of genomic positioin count. Here a peace of my code to merge the Hexamer tables (in Java), let me know if you have question

double genPosSum=0;
    double cutSum=0;
    for (String hexamer: hexamerToGenomicPosCount.keySet()){
        if (!hexamer.contains("other")) {
            cutSum += hexamerToObCuts1.get(hexamer) + hexamerToObCuts2.get(hexamer);
            genPosSum +=hexamerToGenomicPosCount.get(hexamer);
        }
    }

    BufferedWriter out = new BufferedWriter(new FileWriter(outFile));
    out.write(header+Consts.ls);
    for (String hexamer: hexamerToGenomicPosCount.keySet()){
        out.write(hexamer);
        int sum =hexamerToObCuts1.get(hexamer) + hexamerToObCuts2.get(hexamer);
        out.write(" "+sum+" "+hexamerToGenomicPosCount.get(hexamer));
        double ObCutRatio = (double) sum / cutSum;
        double GPRatio = (double)hexamerToGenomicPosCount.get(hexamer)/genPosSum;
        double corrFactor=GPRatio/ObCutRatio;
        if (!hexamer.contains("other")) {
            out.write(" " + String.format("%.12f",ObCutRatio) + " " + String.format("%.12f",GPRatio) + " " + String.format("%.12f",corrFactor) + Consts.ls);
        }else{
            out.write(" NA NA 1"+Consts.ls);
        }
    }
    out.close();

best, Mathilde

ADD REPLYlink written 11 days ago by mathilde.perez0

Hi Mathilde,

Thanks a lot for the quick reply and code. One more question, I'm not sure where to download the mappability file that ends with .out format as mentioned in the bagfoot_prep_example.R. The only website that I found is this: http://archive.gersteinlab.org/proj/PeakSeq/Mappability_Map/M.musculus/, but it comes from a different package called PeakSeq. Should I generate the .out files by myself? If so, how to do that? Thank you so much!

Zihan.

ADD REPLYlink written 11 days ago by xuzihan09100

Hi Zihan,

yes you have to make them, it took me one entire week on my server... I did hg19. Here some tips to make them:

MAPPABILITY files: - download all .fa files for each chr (hg19 genome): http://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/ - download mappability scripts: http://archive.gersteinlab.org/proj/PeakSeq/Mappability_Map/Code/ -> Make the three executables chr2hash, oligoFindPLFFile, mergeOligoCounts, using the Makefile -> In the directory containing the fa files, do: python compile.py 35 (35 for 35mer) * VERY LONG: 1 WEEK! **

I have one question for you, now I would like to use a more recent motif database. I got the meme files, but I dont know how to get/generate the related meme_index.file. Would you know by chance or did you use only the meme files provided by Bagfoot? Thanks! Mathilde

ADD REPLYlink written 11 days ago by mathilde.perez0

Hi Mathilde,

Thanks for the tips, very useful. For the meme files, I download it from JASPAR: http://jaspar.genereg.net/search?q=Mus+musculus&collection=CORE&tax_group=vertebrates&tax_id=all&type=all&class=all&family=all&version=all This is the example for mouse, you can do the same for human genome. At the bottom left of this page, you can see a csv button, I download this csv file as my index file. I guess you just need to change the row names of this csv file as the meme_index files that the author provided. Hope this helps.

Zihan

ADD REPLYlink written 11 days ago by xuzihan09100

Thanks a lot Zihan! Since I want to use other DBs, I will have to make my own script to make this file. That's OK, I will just skip the description fields for now. All the best, Mathilde

ADD REPLYlink written 10 days ago by mathilde.perez0
2
gravatar for simon.vanheeringen
12 months ago by
simon.vanheeringen170 wrote:

Not sure how well it compares, but I found http://pythonhosted.org/pyDNase/ to be very straightforward to run.

ADD COMMENTlink written 12 months ago by simon.vanheeringen170

Yes Ive seen this one before however I think its more tuned towards DNase-seq data and I am working with ATAC-seq.

ADD REPLYlink written 12 months ago by rbronste200
1

It does have support for ATAC-seq last I checked. Maybe not in the official release, might be in a branch in the github repo.

ADD REPLYlink written 12 months ago by simon.vanheeringen170
1

As Simon said it does have support for ATAC-seq.

ADD REPLYlink written 12 months ago by Ian5.3k
1
gravatar for Ian
12 months ago by
Ian5.3k
University of Manchester, UK
Ian5.3k wrote:

I spent about a week (off and on) and gave up. A colleague of mine has apparently managed to get usable output; after tweaking the R code. I don't like saying bad things about peoples efforts, but the documentation was atrocious.

ADD COMMENTlink written 12 months ago by Ian5.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1241 users visited in the last hour