Question: merge large amount of fastq files into a single one
7
gravatar for catherine12243
4.7 years ago by
United States
catherine12243130 wrote:

I have 30 small fastq files from same sample, and I want to merge it into one file. I know the command is 

cat file1.fastq file2.fastq > bigfile.fastq

but is there any short cut for doing it? It just looks silly to type 30 file names one by one...

Thank you for any idea!

chip-seq fastq • 39k views
ADD COMMENTlink modified 4.7 years ago by Pierre Lindenbaum124k • written 4.7 years ago by catherine12243130

Those with Windows can use this GUI tool (works also on Linux via wine): http://www.dnabaser.com/download/Merge%20Fasta/index.html

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by BioApps740
8
gravatar for David Langenberger
4.7 years ago by
Deutschland
David Langenberger9.1k wrote:
cat file*.fastq > bigfile.fastq
ADD COMMENTlink written 4.7 years ago by David Langenberger9.1k

οh yeah! i was so stupid!

ADD REPLYlink written 4.7 years ago by catherine12243130
8

Be cautious about this approach!  Depending on your system, you can enter an endless loop of concatenating the new file to itself.  I strictly do:

cat *.fq > merged.fastq or cat *.fastq > merged.fq

...or whatever is needed to ensure the pattern does not match the new file being created.

ADD REPLYlink written 4.7 years ago by Brian Bushnell17k

Does this happen? My understanding is that shell first parses "*.fq" and at that time "merged.fq" has not been generated yet. I bet a lot of people must have typed "cat *.txt > out.txt". Shell developers should have been aware of such an issue for many years. I could be wrong, though.

ADD REPLYlink written 4.7 years ago by lh331k
1

Actually, it happened to me once. That's why I put the 'file' as prefix for the input and 'bigfile' for the output. But I didn't know that it is system dependent. Thanks for mentioning it, Brian.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by David Langenberger9.1k
3

I was wrong. You and Brian are right. I can reproduce this endless loop.

ADD REPLYlink written 4.7 years ago by lh331k
14
gravatar for Pierre Lindenbaum
4.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

"It just looks silly to type 30 file names one by one..."

with file globbing: http://en.wikipedia.org/wiki/Glob_%28programming%29

cat file*.fastq > bigfile.fastq

note: it also works with fastq.gz files. ( http://stackoverflow.com/questions/8005114 )

cat file*.fastq.gz > bigfile.fastq.gz

 

ADD COMMENTlink written 4.7 years ago by Pierre Lindenbaum124k

Error while using: cat*.R1_unmapped.fq > unmapped_R1.fq

216_7W_Ca1_R1_unmapped.fq
216_9W_Co2_R1_unmapped.fq 218_5W_Pa1_R1_unmapped.fq
218_7W_Pa2_R1_unmapped.fq

[root@psgl unmapped]# cat *.R1_unmapped.fq > unmapped_R1.fq\

cat: *.R1_unmapped.fq: No such file or directory

ADD REPLYlink written 2.9 years ago by Bioinfonext170
1

(extra dot)

cat *_R1_unmapped.fq > unmapped_R1.fq
ADD REPLYlink written 2.9 years ago by vmicrobio240

Nice solution. Yes. Basically you need to do a 'dumb' file merge.

ADD REPLYlink written 2.8 years ago by BioApps740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 709 users visited in the last hour