How are people efficiently storing methylation data derived from bisulfite sequencing?
The results spit out from using BSmap's associated tools include a value for each cytosine in the genome, in something like the below format. There's clearly some redundancy here, which makes it highly compressable, but we're still looking at gigabytes of files in yet another poorly-documented format.
Has the community coalesced around a format that has decent toolchains attached? The sequence context, in particular, seems useful, but not easily slammed into a bed file. VCF, as always, seems flexible enough to handle this, but might be overkill. Is there something else (maybe even binary) that I'm not finding with a search? Thoughts appreciated.
Example output here: chr pos strand context ratio eff_CT_count C_count CT_count rev_G_count rev_GA_count CI_lower CI_upper 22 16050004 + ATCTG 0.000 1.00 0 1 0 0 0.000 0.793 22 16050013 + GTCCC 0.000 3.00 0 3 0 0 0.000 0.562 22 16050014 + TCCCA 0.000 3.00 0 3 0 0 0.000 0.562
Update: So far the consensus on twitter seems to be that there's been no standard (or even well-defined) format. Would love to hear if anyone knows differently