Hello community. I am working on my first Galaxy workflow. I have a question about the "Join, Subtract and Group: Compare" tool.
My goal is to generate "fetch closest" output: the overlapping or the closest feature (elements of interest from the ChromHMM chromatin state class ENCODE data) to transcription start sites (my own list of 23,245 genes and 1 bp intervals at their TSS's).
Here is a summary of my workflow:
- Input 1: Single interval file - TSS sites
- Input 2: Dataset collection of interval files - 15 files total: chromatin states and their coordinates (i.e., 1_Active_Promoter, 2_Weak_Promoter, 3_Poised_Promoter, ...15_Repetitive/CNV)
- Join TSS intervals (1 bp long) with chromatin state intervals collection (overlapping by 1 bp or more) -- output is 15 files
- Fetch closest non-overlapping chromatin state to TSS -- output is 15 files
Goal for next step: The TSS's that have an overlapping feature (join) will also appear in the Fetch output with the nearest feature instead of the overlapping feature. I want to replace this subset of genes in the Fetch output with the information from the Join output.
I am thinking that the next best step is running "Join, Subtract and Group: Compare" (based on column 1, gene names) to retrieve the rows from the Fetch output that do not match the rows from the Join output. I would then take the "cleaned" Fetch output and Concatenate with the Join output.
Will Compare work on outputs that have multiple files? Can this function pair up the appropriate files, or should I pair up the output files manually for the Compare and Concatenate steps?