Do you know of any initiatives for NGS alignments compression?
BAM format offers compression, but still all aligned sequences and their qualities are stored. Do you know of any reference based compression? I think people at ENA are working in that matter. Have a look at CRAM.
What is you opinion about keeping qualities? Maybe using some quality thresholds is reasonable? Or storing qualities only for mismatches (and maybe for +- 3 bases)?
And what about sequence headers? Do we need to keep this at all in the alignment? Storing pair-end information should be enough in my opinion.
I'm really interested in your opinions:)