Question

Can I use HOMER to create a tag dir from a 400 Gb BAM file?

0

Entering edit mode

8.7 years ago

biostart ▴ 370

Hello,

Did you ever have experience working with very large files with HOMER? I am using a 400 Gb file as Input to create a HOMER tag dir, and expecting intermediate files of >1Tb created in the process. Just wanted to clarify, how does HOMER read the input, is it reading the whole input file in the memory, or processing by small portions? Please tell me before I kill the university's computer cluster :)

Thanks

RNA-Seq ChIP-Seq • 3.2k views

ADD COMMENT • link updated 8.7 years ago by Sukhi Singh 11k • written 8.7 years ago by biostart ▴ 370

Ram · Answer 1 · 2016-02-09

makeTagDirectory basically parses through the alignment file and splits the tags into separate files based on their chromosome. As a result, several *.tags.*tsv* files are created in the output directory. These are made to very efficiently return to the data during downstream analysis. This also helps speed up the analysis of very large data sets without running out of memory.

I think its already taken care of, but in case Chris (cbenner@salk.edu) would be the best to answer this question. He wrote the program, but I reckon it's already customised to be used with the large files and reads them line by line. It's written in Perl.

Otherwise you can always split up the BAM file per chromosome and run makeTagDir parallelly depending upon which cluster you are using, and later on combine the tag dirs into one

To combine tag directories, for example when combining two separate experiments into one, do the following:

makeTagDirectory Combined-PU.1-ChIP-Seq/ -d Exp1-ChIP-Seq/ Exp2-ChIP-Seq/ Exp3-ChIP-Seq/