Question: Ngs Reads Alignment
3
gravatar for Leszek
7.3 years ago by
Leszek4.0k
IIMCB, Poland
Leszek4.0k wrote:

Do you know of any initiatives for NGS alignments compression?

BAM format offers compression, but still all aligned sequences and their qualities are stored. Do you know of any reference based compression? I think people at ENA are working in that matter. Have a look at CRAM.

What is you opinion about keeping qualities? Maybe using some quality thresholds is reasonable? Or storing qualities only for mismatches (and maybe for +- 3 bases)?

And what about sequence headers? Do we need to keep this at all in the alignment? Storing pair-end information should be enough in my opinion.

I'm really interested in your opinions:)

ADD COMMENTlink written 7.3 years ago by Leszek4.0k
2

If it was just about saving space you could get rid of the FASTQ input data, after you stored all sequences and qualities in a BAM file. FASTQ can then be generated from the BAM file. It is a matter of compromise when discarding data. In this case discarding quality and sequence compromises re-analysis. Btw, your question is about compression, but also about discarding data != compression.

ADD REPLYlink written 7.3 years ago by Michael Dondrup46k

yes, I'm curious what are your opinions about lossless compression vs compression discarding some data (like sequence headers, some quals, etc)

ADD REPLYlink written 7.3 years ago by Leszek4.0k
5
gravatar for Madelaine Gogol
7.3 years ago by
Madelaine Gogol5.0k
Kansas City
Madelaine Gogol5.0k wrote:

There's an interesting discussion here: https://plus.google.com/107526144078068918726/posts/ZCBc8DH3yKK

ADD COMMENTlink written 7.3 years ago by Madelaine Gogol5.0k
1
gravatar for Matt Shirley
7.3 years ago by
Matt Shirley8.9k
Cambridge, MA
Matt Shirley8.9k wrote:

Here is a paper describing a scheme for comparative compression of genomes from the same species. You might want to figure out first why you want to compress your data. If you can generate some of these metrics such as quality scores on the fly, then try compressing data after the last step that is computationally intensive.

ADD COMMENTlink written 7.3 years ago by Matt Shirley8.9k
0
gravatar for fac2003
6.9 years ago by
fac20030
fac20030 wrote:

Goby 2.0 is a major milestone for the Goby project which brings state of the art NGS alignment compression as well as very robust SAM/BAM import exports. See the new tutorial ‘What’s new in Goby 2.0‘ for more information.

We created a summary table to compare features of Goby 1.x, 2.0, BAM, CRAM and FASTQ. Click here to see the full table. enter image description here

ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by fac20030
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1082 users visited in the last hour