I have some ChIP-seq data. The 3 replicates have widely differing read lengths. I am intending to call peaks on each replicate, and then merge them to make a pooled data set and call peaks on that data set as well. For each of the three, the read length was 40,50 and 100.
If I pool them, I thought it might be a good idea to either artificially shorten the read length to 40 for all reads in all samples, or lengthen them all to 100. It seems to me that it's a safe bet than the fragment length for each sample is at least 100. So, I thought it shouldn't be a problem to just assume a read length of 100 for all samples.
I have also considered not altering the read length and pooling the samples just as they are. The local average read length in any sufficiently large portion of the genome, should be relatively constant.
Would you lengthen all three samples to 100bp, shorten them all to 40bp or neither?
Lets say I lengthen the sample with 40bp reads to 100bp. Would you do the peak calling for that individual sample on the 40bp reads or the new 100 bp reads?
Does it make sense to merge these? (I am planning to do an analysis of reproducibility, based on IDR and merge if they seem reproducible)