Question: Losing all data after filter step using SGA
0
jolespin • 130 wrote:
Raw fastq > preprocess > index > correct > index = processed_reads.ec.fa (6.5 GB)
I've taken the processed_reads.ec.fa and ran the following command:
sga filter -k 31 processed_reads.ec.fa
stderr:
sga: QCProcess.cpp:233: DuplicateCheckResult QCProcess::performDuplicateCheck(const SequenceWorkItem&): Assertion `fwdIntervals.interval[0].isValid() || rcIntervals.interval[0].isValid()' failed.
stdout:
...
[sga] Processed 8000000 sequences (912.782961s elapsed) [sga] Processed 8050000 sequences (918.024303s elapsed) [sga] Processed 8100000 sequences (925.718005s elapsed) [sga] Processed 8150000 sequences (932.858792s elapsed) Abort (core dumped)
The output files were:
processed_reads.ec.filter.pass.fa (1.4 MB)
processed_reads.ec.discard.fa (2.0 GB)
Does anyone know what's happening? What are these errors? I've tried different thread sizes and different kmer sizes and have not seen any significant improvements...