Question: Data size for ChIPseq and RNAseq data
I would like to have an estimation of the size occupied by all the files of a ChIPseq/RNAseq project after the different steps of mapping/peak calling (fasq==>bam==>bed/bam/bigwig etc..). For example if I have a ChIPseq 1IP +Input in duplicate and 2 RNAseqs in duplicate? If someone has an idea about the different files and their sizes it would be very nice. Thank you in advance for your reply

There's no estimate that can be given for this, it depends entirely on the read depth and length and what is actually kept. In the simplest case, guesstimate ~5GB for an gzipped fastq file and the same for a sorted BAM file of that data. A bigWig file is normally <1GB unless your data is really sparse and you're doing base-pair resolution.

Thank you very much for your quick answer. I understand that this depend on many parameters but the question come from my boss that need to have an idea of the volume we will occupied. In my case for ChIPseq/RNAseq we will be with 25 millions of reads and 75bp. With the number you gave me I will be able to make the calcul. Could you just tell me for the RNAseq and ChIPseq if I forgot some files: RNAseq:fasq.gz==>bam ChIPseq:fasq.gz==>bam==>bam or bed bigwig

Any other files will be of insignificant size (e.g., read counts from featureCounts).

Thank you for your reply and your time. Have a nice day.

