Due to limited repository, I have to remove data after sequencing analysis. I think FASTQ and some BAMs have to remain. But I don't know which one to leave for storing, especially BAMs.
These are what I have in targeted WES. Is it right way to removing all files except that .bam, .recal.sorted.bam, *.recal.sorted.bam.bed.pileup ? If there is a recommended way, please advise me.
I agree with CRAMs. You can convert the original fastq files to unaligned CRAM (Is it possible to directly convert fastq to CRAM ?) which will save you some disk space (60% of original fastq I think). The intermediate BAM files you either convert to CRAM as well, or just delete them if you do not need them for the analysis. After all, people often use Unix pipes to start with the raw alignments and then just pipe them through commands to avoid having lots of intermediate files, outputting only the final BAM will all the filtering and operations run on them which is then being used for the actual analysis (SNP calling, whatever you want to do...). I would also suggest to (if possible in terms of budget) to get an external backup drive. This can be some spinning disk one or even tape drive, as long as it is reliable and big so you can long-term store data and backups in case the main drive or workstation that data are on becomes unavailable (be it damage, technical failure, robbery, whatever). Disk space is ver cost effective these days, it would be a good investment. Maybe your institution offers such a solution, in my experience there are often local services people are not aware of so if you are affiliated with an institute/university/company contact them and ask for information on available IT services, e.g. cloud storage or backup servers.