Question

Join, Subtract and Group: Compare using dataset lists

0

Entering edit mode

7.7 years ago

khaynes ▴ 50

Hello community. I am working on my first Galaxy workflow. I have a question about the "Join, Subtract and Group: Compare" tool.

My goal is to generate "fetch closest" output: the overlapping or the closest feature (elements of interest from the ChromHMM chromatin state class ENCODE data) to transcription start sites (my own list of 23,245 genes and 1 bp intervals at their TSS's).

Here is a summary of my workflow:

Input 1: Single interval file - TSS sites

Input 2: Dataset collection of interval files - 15 files total: chromatin states and their coordinates (i.e., 1_Active_Promoter, 2_Weak_Promoter, 3_Poised_Promoter, ...15_Repetitive/CNV)

Join TSS intervals (1 bp long) with chromatin state intervals collection (overlapping by 1 bp or more) -- output is 15 files

Fetch closest non-overlapping chromatin state to TSS -- output is 15 files

Goal for next step: The TSS's that have an overlapping feature (join) will also appear in the Fetch output with the nearest feature instead of the overlapping feature. I want to replace this subset of genes in the Fetch output with the information from the Join output.

I am thinking that the next best step is running "Join, Subtract and Group: Compare" (based on column 1, gene names) to retrieve the rows from the Fetch output that do not match the rows from the Join output. I would then take the "cleaned" Fetch output and Concatenate with the Join output.

Will Compare work on outputs that have multiple files? Can this function pair up the appropriate files, or should I pair up the output files manually for the Compare and Concatenate steps?

Thank you.

Compare workflow dataset list • 1.8k views

ADD COMMENT • link updated 7.7 years ago by GouthamAtla 12k • written 7.7 years ago by khaynes ▴ 50

1

Entering edit mode

You'll want to post this on the Galaxy site.

ADD REPLY • link 7.7 years ago by Devon Ryan 104k