0
0
Entering edit mode
5.3 years ago
pmarijon ▴ 140

Hi,

I want read a sequence file (fasta fastq bam, etc), so I read Seqan tutorial. But If I want know my position in file I need use std::ifstream (for generate a progress bar) , it's not a problem, I write this test code:

#include <iostream>
#include <fstream>

#include <seqan/seq_io.h>

int main (int argc, char ** argv) {
std::streampos begin,end;
std::ifstream myfile (argv[1], std::ios::in | std::ios::binary);

begin = myfile.tellg();

seqan::SeqFileIn seq_file(myfile);
seqan::CharString id;
seqan::Dna5String seq;
seqan::CharString qual;

while(!seqan::atEnd(seq_file))
{
std::cout<<"pos: "<<myfile.tellg()<<" id "<<id<<std::endl;
}

end = myfile.tellg();

myfile.close();

std::cout << "begin: "<< begin << " end: "<< end << std::endl;
std::cout << "size is: " << (end-begin) << " bytes.\n"<<std::endl;
return 0;
}


But when I try this code on compressed fastq read, Seqan throw an exception terminate called after throwing an instance of 'seqan::ParseError'

My question :

• Use std::ifstream is the only solution to get the current position in file ?
• How I can say to Seqan this stream are a compressed stream ?
• Can I generate an uncompressed stream from my compressed stream (with SeqAn or zlib)

Thanks.

seqan • 1.5k views
1
Entering edit mode

why would you want to know the position of a fastq record in a compressed file ? unless you're using bgzf, there is no way to 'fseek ' a bgzip file...

0
Entering edit mode

I want generate a progress bar, the post required an edit. For compress file we can have a good approximation with size of compressed file and the position in compressed file.

0
Entering edit mode

then I would create a custom std::streambuf to count the number of bytes... e.g: https://artofcode.wordpress.com/2010/12/12/deriving-from-stdstreambuf/

0
Entering edit mode

I use a std::ifstream to get current position in file during seqan parsing, it's easy. But when I try my code on compressed file, seqan parsing failed. So seqan didn't detect my stream contain compressed data or seqan can't work on compressed stream, but isn't documented.

0
Entering edit mode

So seqan didn't detect my stream contain compressed data or seqan can't work on compressed stream, but isn't documented.

Usually it is the other way round: Things don't work on compressed data, unless documented.

0
Entering edit mode

Is documented

These classes provide an API for accessing sequence files in different file formats, either compressed or uncompressed.

0
Entering edit mode

Well, there is compressed .bam and compressed .gz.

0
Entering edit mode