ENCODE terminology question
1
0
Entering edit mode
3.2 years ago
ccc ▴ 30

I'm looking at this ENCODE page Association Graph: https://www.encodeproject.org/experiments/ENCSR687JCD/

and the first pipeline the data goes through in the top has the steps listed "fastq concatenation", "read trimming", "alignment", and "pooling". What exactly is the purpose of these steps? Is the idea of fastq concatenation that you have multiple sequencings of the same sample, so to get a more robust sample, you should concatenate them? Then read trimming (from the biostar handbook it seems) is both trimming off the adapters and the "low quality sequences". Then alignment is made against the reference, which is why we get a BAM file.

Am I mistaken about any of this so far?

But then what is pooling?

Then what exactly is meant by filtering (in the next pipeline step)? It seems filtering has already been done with "read trimming". I can see how maybe there is some differences, and wondering what they are.

sequencing alignment • 616 views
ADD COMMENT
1
Entering edit mode
3.2 years ago

You can see the details for any pipeline step by selecting the item in the graph, which should open a pop-up with the details. You can then select the script used to do the analysis if you like.

I couldn't see where "pooling" was described either - the script attached to the step (https://github.com/ENCODE-DCC/dnase_pipeline/blob/v2.0/dnanexus/dnase-align-bwa-pe/resources/usr/bin/dnase_align_bwa_pe.sh) just removes UMIs, trims adaptors and aligns using BWA. But if I had to guess, I'd say that it means merging together alignments from different input files for the peak calling step (we do this with our ATAC-seq pipeline). Since there is only one replicate for this experiment it wouldn't actaully happen.

Filtering is described as:

Merges and filters paired-end aligned reads for quality and UMI duplicates for DNase ENCODE uniform processing pipeline.

And the script is here: https://github.com/ENCODE-DCC/dnase_pipeline/blob/v2.0/dnanexus/dnase-filter-pe/resources/usr/bin/dnase_filter_pe.sh

ADD COMMENT
0
Entering edit mode

Thank you for your response! Sorry if this is a dumb question, but in the link there are multiple FASTQ files, yet there is a single replicate. Does this mean they had a single mouse embryo, and took multiple samples of that single embryo, and those correspond to the different FASTQ files?

ADD REPLY
1
Entering edit mode

There is no real way to tell, but that would be my guess.

ADD REPLY

Login before adding your answer.

Traffic: 2400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6