Question: Automating Flash Sequence Joins
1
gravatar for espop23
4.5 years ago by
espop2360
London
espop2360 wrote:

Hello,

I have around 100 fastq files, of forward and reverse reads I want to join in flash. I would like to create an automated script that would just go through a folder and join the reads for me.

Does anyone know if this is possible?

Best

bash script automate flash • 1.5k views
ADD COMMENTlink modified 4.5 years ago by c.v.oflynn100 • written 4.5 years ago by espop2360

You mean like cat *.fastq > onebig.fastq ?

ADD REPLYlink written 4.5 years ago by Michael Dondrup47k

Is this not just creating a big file? I want to use the program flash. Do you think putting them all in one file and then applying flash is the way to go?

ADD REPLYlink written 4.5 years ago by espop2360
1

Not necessarily. You could write a for loop and go through the file set (I guess 50 pairs). If you have access to a cluster you could submit all 50 jobs at the same time.

ADD REPLYlink written 4.5 years ago by genomax91k

What is flash, could you link it?

ADD REPLYlink written 4.5 years ago by Michael Dondrup47k

FLASH is a read joiner (like BBMerge from BBMap).

ADD REPLYlink written 4.5 years ago by genomax91k
3
gravatar for c.v.oflynn
4.5 years ago by
c.v.oflynn100
United Kingdom
c.v.oflynn100 wrote:

A for loop would do the trick, assuming the paired reads are named like; reads1_1.fq reads1_2.fq

for i in $(ls *fq | grep  "_1" | cut -f 1 -d "_"); do flash ${i}_1.fq ${i}_2.fq; done
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by c.v.oflynn100

May want to post an additional version with R1/R2 nomenclature since that is default filenames with Illumina pipelines.
@espop23 add delete flash options as needed.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by genomax91k

The files appear like this: NG-7284_49811102_lib40117_2432_1_1.fastq ; NG-7284_49811102_lib40117_2432_1_2.fastq - does this change anything?

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by espop2360

Give following a try. If you are running this on a single machine it may not be advisable to run the jobs like this since all 50 jobs would be submitted at the same time.

for i in $(ls *_1.fastq | cut -f 1-5 -d "_"); do flash ${i}_1.fastq ${i}_2.fastq; done

This will work only if all files names follow the nomenclature you posted above.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by genomax91k
ls *_1.fastq | grep -Poh ".*_1" | sed 's/1$//'

will work if the number of underscores is not always the same

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by c.v.oflynn100

Is there a way to create folders for each pair in the script or someway to separate the output? Since it seems the output files are being overwritten each time...

ADD REPLYlink written 4.5 years ago by espop2360

I can only speculate and say that you are not running the command right. Can you post the exact command you are running?
We assumed you know how to run FLASH from before. You may need to send the output to new file something like (use the correct syntax I am only generalizing using an output redirect).

for i in $(ls *_1.fastq | cut -f 1-5 -d "_"); do flash ${i}_1.fastq ${i}_2.fastq > $i\_merged.fastq ; done
ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by genomax91k

I ran the exact line you last wrote, and got a merged fastq for each input file. There is only one out.extendedFrags.fastq - is this as expected? (I am new to FLASH, apologies for confusion)

ADD REPLYlink written 4.5 years ago by espop2360

I only got one output, when I ran it on 12 files with the nomenclature I wrote previously. Not sure what is wrong..

ADD REPLYlink written 4.5 years ago by espop2360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2256 users visited in the last hour