Question: How To Split Reads For Different Flowcell Lanes In Fastq Files?
1
gravatar for newDNASeqer
5.6 years ago by
newDNASeqer630
United States
newDNASeqer630 wrote:

My fastQ file was delivered by the sequencing core as a combined file that has reads from two flow cell lanes. I am wondering if there's a way to split the reads from the two lanes? The downstream pipeline is Tophat-cufflinks-cuffmerge-cuffdiff.

I've also read the documentation of Tophat and did not see an option of splitting the reads in tophat, so I am asking here in this forum. thanks

reads split • 4.8k views
ADD COMMENTlink modified 5.6 years ago by Rm7.8k • written 5.6 years ago by newDNASeqer630

is the lane in the ID for each read ? If so, you could write a simple python/perl script to do that.

ADD REPLYlink written 5.6 years ago by Gabriel R.2.6k

Do the IDs have any distinguishing marks? (They should.) If you post a brief snippet containing a read from each lane, one of us could probably whip up a quick script or at least help you get started.

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Alex Reynolds27k
7
gravatar for Rm
5.6 years ago by
Rm7.8k
Danville, PA
Rm7.8k wrote:

Quick Awk solution to separate merged fastq file based on lane

paste - - - -  my.R1.fastq | awk -F"\t" '{ split($1, arr, ":"); print $1 "\n" $2 "\n+\n" $4 >"lane."arr[4]".R1.fastq" }'
ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Rm7.8k
2

I have a pure Awk solution that is much faster. Like the above solution, let's assume that the records are blocks of 4 lines:

awk 'BEGIN {FS = ":"} {lane=$4 ; print > "lane."lane".fastq" ; for (i = 1; i <= 3; i++) {getline ; print > "lane."lane".fastq"}}' < my.R1.fastq

Using the getline command 3 times, you can read blocks of 4 lines (from the standard input, hence the <).

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Frédéric Mahé2.9k

Thanks for this solution - I tried it and it works fast and nicely. I'm not familiar with awk, so could you please explain why your solution is faster please?

ADD REPLYlink written 2.8 years ago by DVA500

Totally rad. I love one-liners.

ADD REPLYlink written 5.6 years ago by Dan D6.7k

+1 for the paste

ADD REPLYlink written 5.6 years ago by Pierre Lindenbaum118k
4
gravatar for Dan D
5.6 years ago by
Dan D6.7k
Tennessee
Dan D6.7k wrote:

enter image description here

See that highlighted "3" in the first line? That's the lane number in the FASTQ standard. If you read in your FASTQ file and direct your reads to different output files based on that value, you'll have different FASTQ files separated by lane.

Do you need help writing the script to do that?

ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Dan D6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1265 users visited in the last hour