Question: Seqan read compressed stream
0
gravatar for pmarijon
4 weeks ago by
pmarijon30
pmarijon30 wrote:

Hi,

I want read a sequence file (fasta fastq bam, etc), so I read Seqan tutorial. But If I want know my position in file I need use std::ifstream (for generate a progress bar) , it's not a problem, I write this test code:

#include <iostream>
#include <fstream>

#include <seqan/seq_io.h>


int main (int argc, char ** argv) {
    std::streampos begin,end;
    std::ifstream myfile (argv[1], std::ios::in | std::ios::binary);

    begin = myfile.tellg();

    seqan::SeqFileIn seq_file(myfile);
    seqan::CharString id;
    seqan::Dna5String seq;
    seqan::CharString qual;

    while(!seqan::atEnd(seq_file))
    {
    seqan::readRecord(id, seq, qual, seq_file);
    std::cout<<"pos: "<<myfile.tellg()<<" id "<<id<<std::endl;
    }

    end = myfile.tellg();

    myfile.close();

    std::cout << "begin: "<< begin << " end: "<< end << std::endl;
    std::cout << "size is: " << (end-begin) << " bytes.\n"<<std::endl;
    return 0;
}

But when I try this code on compressed fastq read, Seqan throw an exception terminate called after throwing an instance of 'seqan::ParseError'

My question :

  • Use std::ifstream is the only solution to get the current position in file ?
  • How I can say to Seqan this stream are a compressed stream ?
  • Can I generate an uncompressed stream from my compressed stream (with SeqAn or zlib)

Thanks.

seqan • 158 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by pmarijon30
1

why would you want to know the position of a fastq record in a compressed file ? unless you're using bgzf, there is no way to 'fseek ' a bgzip file...

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum103k

I want generate a progress bar, the post required an edit. For compress file we can have a good approximation with size of compressed file and the position in compressed file.

ADD REPLYlink written 4 weeks ago by pmarijon30

then I would create a custom std::streambuf to count the number of bytes... e.g: https://artofcode.wordpress.com/2010/12/12/deriving-from-stdstreambuf/

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum103k

I use a std::ifstream to get current position in file during seqan parsing, it's easy. But when I try my code on compressed file, seqan parsing failed. So seqan didn't detect my stream contain compressed data or seqan can't work on compressed stream, but isn't documented.

ADD REPLYlink written 4 weeks ago by pmarijon30

So seqan didn't detect my stream contain compressed data or seqan can't work on compressed stream, but isn't documented.

Usually it is the other way round: Things don't work on compressed data, unless documented.

ADD REPLYlink written 4 weeks ago by kloetzl740

Is documented

These classes provide an API for accessing sequence files in different file formats, either compressed or uncompressed.

Source : https://seqan.readthedocs.io/en/master/Tutorial/InputOutput/SequenceIO.html

ADD REPLYlink written 4 weeks ago by pmarijon30

Well, there is compressed .bam and compressed .gz.

ADD REPLYlink written 4 weeks ago by kloetzl740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1168 users visited in the last hour