Question

Help with running ATAC using Encode pipeline

0

Entering edit mode

10 months ago

Chris ▴ 260

Hi all, I am trying to run atac pipeline of Encode on a HPC but not sure the correct command after reading their instruction. https://github.com/ENCODE-DCC/atac-seq-pipeline

If you want to run your data, what will you put in the INPUT_JSON?

INPUT_JSON="https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json"
caper hpc submit atac.wdl -i "${INPUT_JSON}" --singularity --leader-job-name ANY_GOOD_LEADER_JOB_NAME

Thank you so much!

encode ATAC-seq • 2.4k views

ADD COMMENT • link updated 10 months ago by rfran010 ▴ 900 • written 10 months ago by Chris ▴ 260

score 2 · Answer 1 · 2023-06-01

2

Entering edit mode

10 months ago

rfran010 ▴ 900

Read all instructions carefully

Specifically, this info is under the "Input JSON file specification" section with details in the following link and an example provided as well.

https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/input_short.md

https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/example_input_json/template.json

overall, you will define your own JSON file that include paths specific to your data and relevant genome, then you could would set INPUT_JSON as the path to your specific JSON file

INPUT_JSON="/my/specific/file/sample.json"

ADD COMMENT • link 10 months ago by rfran010 ▴ 900

0

Entering edit mode

Thank you for the instruction! I see the beside json file we also have input file which I am confused. Because we have already defined the fastq.gz file in the json file. enter image description here Would you explain about the value I should put in adapter if it is Illumina because I don't know why they use that value.

ADD REPLY • link 10 months ago by Chris ▴ 260

2

Entering edit mode

The "Input files" section is details on how you can specify your fastqs in the JSON. (You specify all pipeline parameters in the JSON.) Essentially, the pipeline aims to be flexible and take a wide range of input files, so there are many ways to specify your inputs WITHIN the JSON.

for "adapters" section, you can read the instructions and follow them (either manually specify adapters or use the auto-detect feature). I suggest just setting the auto-detect feature on "atac.auto_detect_adapter": true and ignoring (exlude from JSON file) all other atac.adapters keys. Unless of course, you have custom or different adapaters.

ADD REPLY • link 10 months ago by rfran010 ▴ 900

0

Entering edit mode

So do we have redundant in this case when we specify the path to fastq.gz files in both input file and json file? json file is .json, how about the file format of input file. Thank you for your help!

ADD REPLY • link 10 months ago by Chris ▴ 260

2

Entering edit mode

Curious, can you give specific examples of what you refer to as "input file" and "json file"?

In the screenshot you shared, input files refers to input fastq files or input bams or other types of input sequencing/mapping files. The JSON file contains information specific to your experiment and tells the pipeline where everything is. The JSON file in the "input" to the pipeline submission command, but contains locations to your actual "input files" and other relevant files.

ADD REPLY • link 10 months ago by rfran010 ▴ 900

0

Entering edit mode

Sure, here is my json file, I have 8 fastq files:

{
    "atac.title" : “atac)”,
    "atac.description" : “Encode”,

    "atac.pipeline_type" : "atac",
    "atac.align_only" : false,
    "atac.true_rep_only" : false,

    "atac.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v4/hg38.tsv",

    "atac.paired_end" : true,

    "atac.fastqs_wt_rep1_R1" : [ "rep1_R1_L1.fastq.gz" ],
    "atac.fastqs_wt_rep1_R2" : [ "rep1_R2_L1.fastq.gz"],
    "atac.fastqs_wt_rep2_R1" : [ "rep2_R1_L1.fastq.gz" ],
    "atac.fastqs_wt_rep2_R2" : [ "rep2_R2_L1.fastq.gz" ],
    "atac.fastqs_di_rep1_R1" : [ "rep1_R1_L1.fastq.gz" ],
    "atac.fastqs_di_rep1_R2" : [ "rep1_R2_L1.fastq.gz"],
    "atac.fastqs_di_rep2_R1" : [ "rep2_R1_L1.fastq.gz" ],
    "atac.fastqs_di_rep2_R2" : [ "rep2_R2_L1.fastq.gz" ],

    "atac.auto_detect_adapter" : true,


    "atac.multimapping" : 4
}

I am not sure what to put in the input file. Is it something like this?

    {

path/to/fastq.gz

    }

ADD REPLY • link 10 months ago by Chris ▴ 260

0

Entering edit mode

I am not sure what to put in the input file. Is it something like this?

What input file are you referring to? Your fastqs are given in the json?

ADD REPLY • link 10 months ago by rfran010 ▴ 900

0

Entering edit mode

Yes, I think that is what the pipeline wants. I didn't see tutorials or videos which are easy to follow for new users so just reading the instruction still difficult for me.

ADD REPLY • link 10 months ago by Chris ▴ 260

1

Entering edit mode

Yes, it is a lot to take in.

In your input json, is "atac.fastqs_wt_rep2_R2" valid? This key should match exactly what is given in the examples.

Once you're json is ready, use that to begin the pipeline. The fastqs you specified are the only input files you need to worry about. If your json is in the current directory and named "my_json.json" it could look like this:

caper hpc submit atac.wdl -i "./my_json.json" --singularity --leader-job-name ANY_GOOD_LEADER_JOB_NAME

ADD REPLY • link 10 months ago by rfran010 ▴ 900

0

Entering edit mode

Hi @rfran010. What do you mean by valid? Could you give an example of how to match look like?

ADD REPLY • link 10 months ago by Chris ▴ 260

0

Entering edit mode

Please see the screenshot you sent of input files.

ADD REPLY • link 10 months ago by rfran010 ▴ 900

0

Entering edit mode

Do you mean it has to be like:

 "atac.fastqs_wt_rep2_R2" : ["BIOREP2_TECHREP1.R2.fq.gz", "BIOREP2_TECHREP2.R2.fq.gz"],

ADD REPLY • link 10 months ago by Chris ▴ 260

0

Entering edit mode

I believe this part is incorrect

"atac.fastqs_wt_rep2_R2" :

ADD REPLY • link 10 months ago by rfran010 ▴ 900