I came across a dataset I was going to use for some Chip-seq (chomatine) analysis, however one SRA entry for one sample has several runs:
example: https://www.ncbi.nlm.nih.gov/sra?term=SRX5827105
(from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE131140)
The runs are 800M bases each which makes me believe that I should concatenate all since they belong to 1 sample. Just not clear to me why this data is split.
Doing alignment and peak calling: Can I concatenate the files before alignment, or should I concatenate only before peak calling (..... input files are split in the same way) or since they are at random: can I just pick some as a test?
Kind regards! Coulnd't really find any documentation on this! Thanks in advance!
Seems like a reasonable assumption. They seem to have split the data into 16M read chunks.There are other samples in this experiment that seem to be following the same trend AND have the same
biosample
accession.