I would like to know how you guys address batch effects on re sequence on the same samples (Fastq files).
Our client targeted 20 million reads for all of her samples. However, in the first run, we generated less than 20 million reads for a couple of samples(sample_2,3 and 7). So we re sequenced those samples again.
For the 1st run
sample_id #_obtained_reads sample_1 21.4 sample_2 11 sample_3 12 sample_4 35.5 sample_5 23.8 sample_6 29.4 sample_7 10 sample_8 23.8 sample_9 24.3 sample_10 18.6
For the 2nd run
sample_id #_obtained_reads sample_2 9 sample_3 8 sample_7 10
When it comes to downstream analysis, how would you address those samples(sample2, 3 and 7). Would you just merge them? i.g.
cat sample_2.fastq.gz (from the 1st run) sample_2.fastq.gz (from the 2nd run) > sample_2.merged.fastq.gz ?
Or would you visualize PCA or hclustering to see if they cluster together or not, and then decide to drop/merge the samples from the 2nd run?