Sizes of Nanopore files and how to store?
1
1
Entering edit mode
4.0 years ago
Floris Brenk ★ 1.0k

Hi all,

Had some practical questions about the output of the Nanopore PromethION and was hoping someone could help out.

  • What is the estimate size per flow-cell, best I could find is ranging from 2TB-3TB?
  • When using human DNA best I could find was that this was ~40X coverage, is this correct?
  • Is there are more efficient way to store this raw data since its pretty expensive longterm?
  • Mapped to hg38 what would be the size of the bam files, and vcf file?

Thanks!

Nanopore WGS Long-Read • 3.2k views
ADD COMMENT
5
Entering edit mode
4.0 years ago

In our hands, a good-enough PromethION run generates about 80 Gigabase of sequencing data, corresponding to ~25x coverage of a human genome. The raw data of such an experiment is about 1.4 Terabytes of fast5 files. If you don't really care about nucleotide modifications or improved basecalling you can just delete those fast5 files and work with the fastq data (about 80 Gigabyte). The corresponding BAM (using minimap2) is about 100 Gigabyte, and it's worth converting to CRAM to get these smaller.

I can't conclusively answer the VCF file question, as you should probably tell me which variants you are going to call (structural variants, small variants, both), and it will depend on your variant caller. Regardless, the VCF will be really small in comparison with the rest. SVs called with Sniffles are about 6 Megabytes, SNVs called with Longshot are about 220 Megabytes.

ADD COMMENT
0
Entering edit mode

Great thanks a lot Wouter, much appreciated! Think we will just delete the fast5 files which will save a lot of storage costs

ADD REPLY
0
Entering edit mode

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 2503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6