I'm looking at this ENCODE page Association Graph: https://www.encodeproject.org/experiments/ENCSR687JCD/
and the first pipeline the data goes through in the top has the steps listed "fastq concatenation", "read trimming", "alignment", and "pooling". What exactly is the purpose of these steps? Is the idea of fastq concatenation that you have multiple sequencings of the same sample, so to get a more robust sample, you should concatenate them? Then read trimming (from the biostar handbook it seems) is both trimming off the adapters and the "low quality sequences". Then alignment is made against the reference, which is why we get a BAM file.
Am I mistaken about any of this so far?
But then what is pooling?
Then what exactly is meant by filtering (in the next pipeline step)? It seems filtering has already been done with "read trimming". I can see how maybe there is some differences, and wondering what they are.
Thank you for your response! Sorry if this is a dumb question, but in the link there are multiple FASTQ files, yet there is a single replicate. Does this mean they had a single mouse embryo, and took multiple samples of that single embryo, and those correspond to the different FASTQ files?
There is no real way to tell, but that would be my guess.