Which data recommends remaining or removing for backups of NGS (WES or WGS)?
27 days ago

Due to limited repository, I have to remove data after sequencing analysis. I think FASTQ and some BAMs have to remain. But I don't know which one to leave for storing, especially BAMs.

These are what I have in targeted WES. Is it right way to removing all files except that .bam, .recal.sorted.bam, *.recal.sorted.bam.bed.pileup ? If there is a recommended way, please advise me.

27 days ago
Zhenyu Zhang ▴ 370

pileup files can be easily generated from BAMs, so no needs to keep them

As to FASTQ or BAM, if in your alignment workflow, you

  • do not trim reads
  • have not removed any reads (or only removed chastity failed reads)
  • maintained original quality after bqsr then BAM is what you want to keep.

Otherwise, you would need to keep original data in FASTQ (and maybe BAM also depends on if you are willing to realign)

You can further compress BAM into CRAM to save space

I agree with CRAMs. You can convert the original fastq files to unaligned CRAM (Is it possible to directly convert fastq to CRAM ?) which will save you some disk space (60% of original fastq I think). The intermediate BAM files you either convert to CRAM as well, or just delete them if you do not need them for the analysis. After all, people often use Unix pipes to start with the raw alignments and then just pipe them through commands to avoid having lots of intermediate files, outputting only the final BAM will all the filtering and operations run on them which is then being used for the actual analysis (SNP calling, whatever you want to do...). I would also suggest to (if possible in terms of budget) to get an external backup drive. This can be some spinning disk one or even tape drive, as long as it is reliable and big so you can long-term store data and backups in case the main drive or workstation that data are on becomes unavailable (be it damage, technical failure, robbery, whatever). Disk space is ver cost effective these days, it would be a good investment. Maybe your institution offers such a solution, in my experience there are often local services people are not aware of so if you are affiliated with an institute/university/company contact them and ask for information on available IT services, e.g. cloud storage or backup servers.


