FastQ format (\n)
1
0
Entering edit mode
6.8 years ago

Hi, I'm trying to concatenate FastQ files. My dough is: Is there an end_line character at the end of every FastQ? Is it even necessary, do all programs just use regular expressions for knowing which read is which (map using @ char)?

RNA-Seq FastQ • 2.5k views
ADD COMMENT
2
Entering edit mode

Files made in Windows and ancient Mac versions pose unique challenges. Linux uses \n for end of line. Windows uses \r\n because hey, why not bloat everything? And the ancient Mac format was \r only, because hey, why bother with standards? Modern Mac computers use the Linux convention. Windows... well, it will never be compliant with modern uses, so software just has to support it as a special case, which makes parsing everything slower.

So, as long as your files were all generated on the same platform, just concatenate them. If some of them were generated on Windows or ancient Mac versions, I suggest you reformat them to Unix standard prior to concatenation.

ADD REPLY
1
Entering edit mode

What is the actual problem you're having? You can concatenate FastQ files like any other text file with cat *.fastq > all_fastqs.fastq

ADD REPLY
0
Entering edit mode

I know how to concatenate, my doughs are about aligners and things that read these fastQ files. Sometimes fastQ files have an endline character when you use cat, if there is no \n in the first file the new seq will be added to the same line. How do programs read these files do they use that delimiter or do they just use regular expresions?

ADD REPLY
1
Entering edit mode

I've never encountered aligners/assemblers etc that had issues with concatenated fastqs. I'm pretty sure they just ignore the EOF character. Your mileage may vary though. If you're using programs which are less robust, you may find you need to get your hands dirty removing those kinds of invisibles.

ADD REPLY
0
Entering edit mode

Thanks. The people that need these files concatenated I belive are going to use Star Aligner (Not sure). I won't be doing that myself I just wanted to ensure the data was not being corrupted by concatenating the files.

This probably just means the standard aligner uses the "@" char as the beginning of a seq and its end, until the last seq in the file where it uses "@" and that the file has ended.

ADD REPLY
0
Entering edit mode

I am afraid you need to brush up the following concepts before moving any further: 1) FastQ is a text text-file: each line is delimited by new-line and usually an EOF <end-of-file> marker is present, which says the file is finished. 2) regular expressions http://www.regular-expressions.info/

ADD REPLY
1
Entering edit mode

You really need to read and understand the gory details of FastQ format https://en.wikipedia.org/wiki/FASTQ_format

Is there an end_line character at the end of every FastQ

FastQ is a plain-text format, and as such each line of record is delimited by newline

ADD REPLY
0
Entering edit mode

Is it even necessary, do all programs just use regular expressions for knowing which read is which (map using @ char)?

Using just @ (at the beginning of a line) is not enough. It is safer to use part of the machine serial number (e.g. @M0123) since a quality score line can also start with @, a valid Q score.

ADD REPLY
3
Entering edit mode
6.8 years ago

If you are asking if it's okay to just concatenate fastq files together, yes, you can do that. You can also just 'cat' fastq.gz files together. It all works fine. No need to strip an EOF character off prior to concatenating.

ADD COMMENT

Login before adding your answer.

Traffic: 3267 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6