Question

ENCODE terminology question

0

Entering edit mode

3.2 years ago

ccc ▴ 30

I'm looking at this ENCODE page Association Graph: https://www.encodeproject.org/experiments/ENCSR687JCD/

and the first pipeline the data goes through in the top has the steps listed "fastq concatenation", "read trimming", "alignment", and "pooling". What exactly is the purpose of these steps? Is the idea of fastq concatenation that you have multiple sequencings of the same sample, so to get a more robust sample, you should concatenate them? Then read trimming (from the biostar handbook it seems) is both trimming off the adapters and the "low quality sequences". Then alignment is made against the reference, which is why we get a BAM file.

Am I mistaken about any of this so far?

But then what is pooling?

Then what exactly is meant by filtering (in the next pipeline step)? It seems filtering has already been done with "read trimming". I can see how maybe there is some differences, and wondering what they are.

sequencing alignment • 616 views

ADD COMMENT • link updated 3.2 years ago by i.sudbery 19k • written 3.2 years ago by ccc ▴ 30

score 1 · Answer 1 · 2021-02-21

You can see the details for any pipeline step by selecting the item in the graph, which should open a pop-up with the details. You can then select the script used to do the analysis if you like.

I couldn't see where "pooling" was described either - the script attached to the step (https://github.com/ENCODE-DCC/dnase_pipeline/blob/v2.0/dnanexus/dnase-align-bwa-pe/resources/usr/bin/dnase_align_bwa_pe.sh) just removes UMIs, trims adaptors and aligns using BWA. But if I had to guess, I'd say that it means merging together alignments from different input files for the peak calling step (we do this with our ATAC-seq pipeline). Since there is only one replicate for this experiment it wouldn't actaully happen.

Filtering is described as:

Merges and filters paired-end aligned reads for quality and UMI duplicates for DNase ENCODE uniform processing pipeline.

And the script is here: https://github.com/ENCODE-DCC/dnase_pipeline/blob/v2.0/dnanexus/dnase-filter-pe/resources/usr/bin/dnase_filter_pe.sh