Hi everyone, I am trying to understand the problem I have when trying to annotate the peaks of my ChIP-seq that I got after MACS. I am using "operate on genomic intervals" tool to join bed file after MACS and the reference genome in gff3 (Arabidopsis TAIR10) format (that galaxy automatically treats as bed), but I cannot process this step, because the info "All datasets must belong to same genomic build, this dataset is linked to build '?' " is showed. What does it mean? Does it mean the reference genome is somehow incompatibile with the MACS reads? I used the built in galaxy TAIR10 genome and for the annotation a separate file that I downloaded from TAIR, but there should not be any difference between them, should there? Please help!
You can use BEDOPS
gff2bed to map TAIR10 annotations (or annotations from any reference genome you have a GFF file for) to ChIP-seq peaks, i.e.:
$ bedmap --echo --echo-map peaks.bed <(gff2bed < annotations.gff) > answer.bed
If you don't need the entire annotation, you can use
--echo-map-id or other
--echo-map-* options to get a subset of the annotation data. See the documentation or
bedmap --help for more detail.
You can also annotate your peaks using a simple
closestBED -a peaks.bed -b genes_with_coordinates_annotation.bed
This would allow you to annotate your peaks with any coordinate-description (gene, CTCF site, etc..) file.