Hi. I am a biologist in need of a good graphic/visual/fast FastQ editor. Starting from a Biostars thread I implemented few days ago my own SFF/FastQ editor. I hope this is the most complete SFF/FastQ editor available. If you want a specific feature implemented just let me know.
- SFF, FastQ, FQ, Fasta (soon)
- Cut reads with average QV under specified threshold
- Cut reads if they contain N bases (the user can specify how many)
- Cut low complexity reads
- Cut reads that are too short
- Cut reads that are too long
- Cut low quality ends. Automatically detect and cut low quality bases at the end of each read
- Cut poly-A/T tails
Tools and converters
- Dereplicate sequences (to be released soon!)
- Split multiplexed files (MID/barcode splitter)
- Remove contaminants (search over represented sequences against a contaminant database)
- File splitter: Split huge FastQ/SFF file in chunks of x reads
- File splitter: Cut all sequences in the specified range
- Compact FastQ files
- Convert SFF to FastQ
- Convert SFF to Fasta
- Convert FastQ to Fasta (multiFasta)
- Convert FastQ file to a different encoding (under development)
Graphs and data analysis
- Sequence viewer - Show all reads: Read name, Base sequence, average quality, sequence length
- Sequence length distribution graph
- Per base sequence quality graph
- Per base GC content graph
- Per base sequence content graph
- Per base N content graph (integrated in the 'Per Base Content' graph)
- Per sequence quality scores graph graph
- Graphs can be expanded to full screen
- All graphs are update in real time as the file is processed
Version 3.2.3 (released August 2015) can be downloaded here. The size of this program is about 4 MB. No installer needed.
Dereplication is now also available (app). Statistic data about clusters included in Dereplicator.
'Follow' this post to stay up to date.
- <3MB of disk space
- no installation
- no Java
- no .Net
- no admin permissions
- no money :)
Speed & mem footprint:
On an old Toshiba laptop (i5, 2.2GHz) it loads a 0.5GB file in under 11sec (if not processing is applied). This includes also the time for determining the file encoding (Solexa, Illumina, Sanger). The memory footprint should exceeds 15-30MB. I am thinking about doing the file decoding and the data processing in separate threads.
The program was built on feedback from users. So, please comment on things such as:
- Feature requests
- Platform you are interested in (Windows, Mac, Linux) - This is very important!
- Statistics about your files (file type, how many, file size) and your working station (CPU/RAM)
- Which of the already modules are you interested in (so we can improve them)
- New request from users: Allow program resizing so it can fit on very small laptop screens
This tool integrates with Avalanche Workbench.