4.0 years ago by
Seattle, WA USA
If your BED files are sorted, you could use bedops to union all the TFBS to standard output, and pipe that result to bedmap to do the mapping of TFs to master regions.
(This technique assumes that the TFBS files are minimally BED4. That is, the fourth column in each TFBS file contains the ID of the TF. If that is not the case, describe your format in more detail and I'll suggest a quick one-liner with awk to fix up files into the correct form.)
Here's the one-liner that unions and maps:
$ bedops --everything tfbs*.bed | bedmap --echo --echo-map-id-uniq --delim '\t' master.bed - > answer.bed
Piping to standard output avoids the unnecessary step of making an intermediate file somewhere on the hard drive, which is otherwise very expensive in time. So this should be very fast.
Assuming that the ID fields in each of
tfbs400.bed contain the desired TF names or other identifiers of choice, the file
answer.bed contains the results as you expect, except that it uses a semi-colon as an ID delimiter, instead of a comma. You could add
--multidelim ',' to the bedmap statement, if that is a requirement.
If your BED files are not sorted, you could first prepare them with BEDOPS sort-bed, which is faster at sorting BED files than GNU sort.
$ for tfbs_fn in `ls tfbs*.bed`; do sort-bed $tfbs_fn > sorted.$tfbs_fn; done
$ sort-bed master.bed > sorted.master.bed
Then use the sorted files in downstream BED ops. You only need to sort once.