What Raw Sequence File Formats Do You Work With?
1
1
Entering edit mode
13.2 years ago
Science_Robot ★ 1.1k

I'd like to build a parser to load all output from various sequencer technologies into an database. There are many various file formats. I am mostly familiar with FASTQ and FASTA/QUAL produced by Illumina and 454, respectively. What raw sequence file formats do you work with?

sequence format • 2.9k views
ADD COMMENT
1
Entering edit mode

You want to put some FASTQs (or whatever) in a database ? why ? what kind of data to you need to index ? What kind of information do you need to quickly find that won't be available using samtools or the Samtools API ?

ADD REPLY
0
Entering edit mode

Forget about the database. I really just want to know what kind of raw sequence file formats are common.

ADD REPLY
0
Entering edit mode

The reason being that I need random access.

ADD REPLY
2
Entering edit mode
13.2 years ago

Your the best strategy might be to convert everything to FASTQ and build your service for the FASTQ format. Each platform has utilities that convert to FASTQ.

Check out Screed as well, a simple read-only sequence database, designed for short reads

https://github.com/ctb/screed

ADD COMMENT
0
Entering edit mode

+1. Screed is interesting. It would be even better if it supports gzip compression. Pretty much everything larger than 1GB is compressed on my disk.

ADD REPLY
0
Entering edit mode

I'm trying to move away from flat files. Screed looks interesting. Wouldn't be hard to add compression.

ADD REPLY

Login before adding your answer.

Traffic: 2670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6