Which data recommends remaining or removing for backups of NGS (WES or WGS)?
Entering edit mode
13 months ago

Due to limited repository, I have to remove data after sequencing analysis. I think FASTQ and some BAMs have to remain. But I don't know which one to leave for storing, especially BAMs.

These are what I have in targeted WES. Is it right way to removing all files except that .bam, .recal.sorted.bam, *.recal.sorted.bam.bed.pileup ? If there is a recommended way, please advise me.

Entering edit mode
13 months ago
Zhenyu Zhang ▴ 540

pileup files can be easily generated from BAMs, so no needs to keep them

As to FASTQ or BAM, if in your alignment workflow, you

  • do not trim reads
  • have not removed any reads (or only removed chastity failed reads)
  • maintained original quality after bqsr then BAM is what you want to keep.

Otherwise, you would need to keep original data in FASTQ (and maybe BAM also depends on if you are willing to realign)

You can further compress BAM into CRAM to save space

Entering edit mode

I agree with CRAMs. You can convert the original fastq files to unaligned CRAM (Is it possible to directly convert fastq to CRAM ?) which will save you some disk space (60% of original fastq I think). The intermediate BAM files you either convert to CRAM as well, or just delete them if you do not need them for the analysis. After all, people often use Unix pipes to start with the raw alignments and then just pipe them through commands to avoid having lots of intermediate files, outputting only the final BAM will all the filtering and operations run on them which is then being used for the actual analysis (SNP calling, whatever you want to do...). I would also suggest to (if possible in terms of budget) to get an external backup drive. This can be some spinning disk one or even tape drive, as long as it is reliable and big so you can long-term store data and backups in case the main drive or workstation that data are on becomes unavailable (be it damage, technical failure, robbery, whatever). Disk space is ver cost effective these days, it would be a good investment. Maybe your institution offers such a solution, in my experience there are often local services people are not aware of so if you are affiliated with an institute/university/company contact them and ask for information on available IT services, e.g. cloud storage or backup servers.


Login before adding your answer.

Traffic: 1894 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6