Question: Seqan read compressed stream
0
gravatar for pmarijon
6 months ago by
pmarijon110
pmarijon110 wrote:

Hi,

I want read a sequence file (fasta fastq bam, etc), so I read Seqan tutorial. But If I want know my position in file I need use std::ifstream (for generate a progress bar) , it's not a problem, I write this test code:

#include <iostream>
#include <fstream>

#include <seqan/seq_io.h>


int main (int argc, char ** argv) {
    std::streampos begin,end;
    std::ifstream myfile (argv[1], std::ios::in | std::ios::binary);

    begin = myfile.tellg();

    seqan::SeqFileIn seq_file(myfile);
    seqan::CharString id;
    seqan::Dna5String seq;
    seqan::CharString qual;

    while(!seqan::atEnd(seq_file))
    {
    seqan::readRecord(id, seq, qual, seq_file);
    std::cout<<"pos: "<<myfile.tellg()<<" id "<<id<<std::endl;
    }

    end = myfile.tellg();

    myfile.close();

    std::cout << "begin: "<< begin << " end: "<< end << std::endl;
    std::cout << "size is: " << (end-begin) << " bytes.\n"<<std::endl;
    return 0;
}

But when I try this code on compressed fastq read, Seqan throw an exception terminate called after throwing an instance of 'seqan::ParseError'

My question :

  • Use std::ifstream is the only solution to get the current position in file ?
  • How I can say to Seqan this stream are a compressed stream ?
  • Can I generate an uncompressed stream from my compressed stream (with SeqAn or zlib)

Thanks.

seqan • 271 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by pmarijon110
1

why would you want to know the position of a fastq record in a compressed file ? unless you're using bgzf, there is no way to 'fseek ' a bgzip file...

ADD REPLYlink written 6 months ago by Pierre Lindenbaum108k

I want generate a progress bar, the post required an edit. For compress file we can have a good approximation with size of compressed file and the position in compressed file.

ADD REPLYlink written 6 months ago by pmarijon110

then I would create a custom std::streambuf to count the number of bytes... e.g: https://artofcode.wordpress.com/2010/12/12/deriving-from-stdstreambuf/

ADD REPLYlink written 6 months ago by Pierre Lindenbaum108k

I use a std::ifstream to get current position in file during seqan parsing, it's easy. But when I try my code on compressed file, seqan parsing failed. So seqan didn't detect my stream contain compressed data or seqan can't work on compressed stream, but isn't documented.

ADD REPLYlink written 6 months ago by pmarijon110

So seqan didn't detect my stream contain compressed data or seqan can't work on compressed stream, but isn't documented.

Usually it is the other way round: Things don't work on compressed data, unless documented.

ADD REPLYlink written 6 months ago by kloetzl960

Is documented

These classes provide an API for accessing sequence files in different file formats, either compressed or uncompressed.

Source : https://seqan.readthedocs.io/en/master/Tutorial/InputOutput/SequenceIO.html

ADD REPLYlink written 6 months ago by pmarijon110

Well, there is compressed .bam and compressed .gz.

ADD REPLYlink written 6 months ago by kloetzl960

And bz2 according to http://docs.seqan.de/seqan/master/group_FileCompressionTags.html#FileCompressionTags%23BgzFile

ADD REPLYlink written 6 months ago by pmarijon110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 997 users visited in the last hour