Minion data - how to deal with thousands of separate read files?
1
0
Entering edit mode
7.5 years ago
jdaniels33 • 0

Hi,

A colleague has graciously provided some sequence to me in the form of 4339 reads from a minION. These reads are comprised of fwd, complement, and 2D, and they were provided as 4339 separate fastq files.

I have only previously worked with MiSeq data, where only a pair of files are generated from a sequencing run.

I am planning to try a hybrid assembly with these MinION reads (with MiSeq) in SPAdes. Is there a tool that can concatanate all of the data from the 4339 MinION reads into one single fastq file?

Thank you,

JD

Assembly minion next-gen SPAdes • 5.9k views
ADD COMMENT
3
Entering edit mode
7.5 years ago

If this is your only copy of the data, make sure you always have a backup before making changes. Separate fastq-files is rather uncommon, MinION produces .fast5 format (hdf5) which is most commonly converted to a single fastq files (e.g. poretools). But okay.

I assuming you use some flavor of Linux. Theoretically, you would do something like cat *.fastq > combined.fastq But since you have so many files, this might run into problems. After expansion of *.fastq by your shell this will be a very long command. So we'll have to split it up a bit.

I would start with something like the following:

ls *.fastq | split -l 100 -additional-suffix .fastq.list #Make lists of fastq files split by 100 files

for f in *.fastq.list
do
cat `cat $f` > $f.combinedpart.fastq
done

followed by:

cat *.combinedpart.fastq > combinedAll.fastq

Note that I don't have that many files as you here to test the approach above. Feel free to let me know if something doesn't work.

ADD COMMENT
0
Entering edit mode

Hi Wouter,

The first command worked! I actually tried the second command first and it said that the additional suffix was too long.

Many thanks

JD

ADD REPLY
0
Entering edit mode

If the first command worked then we don't have to worry about the rest :-) (just .list as suffix would have been good enough but I like to be as informative as possible in my filenames).

Enjoy your analysis!

ADD REPLY

Login before adding your answer.

Traffic: 1992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6