Question: How To Split Reads For Different Flowcell Lanes In Fastq Files?
1
gravatar for newDNASeqer
6.4 years ago by
newDNASeqer650
United States
newDNASeqer650 wrote:

My fastQ file was delivered by the sequencing core as a combined file that has reads from two flow cell lanes. I am wondering if there's a way to split the reads from the two lanes? The downstream pipeline is Tophat-cufflinks-cuffmerge-cuffdiff.

I've also read the documentation of Tophat and did not see an option of splitting the reads in tophat, so I am asking here in this forum. thanks

reads split • 5.4k views
ADD COMMENTlink modified 6.4 years ago by Rm7.9k • written 6.4 years ago by newDNASeqer650

is the lane in the ID for each read ? If so, you could write a simple python/perl script to do that.

ADD REPLYlink written 6.4 years ago by Gabriel R.2.6k

Do the IDs have any distinguishing marks? (They should.) If you post a brief snippet containing a read from each lane, one of us could probably whip up a quick script or at least help you get started.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Alex Reynolds29k
7
gravatar for Rm
6.4 years ago by
Rm7.9k
Danville, PA
Rm7.9k wrote:

Quick Awk solution to separate merged fastq file based on lane

paste - - - -  my.R1.fastq | awk -F"\t" '{ split($1, arr, ":"); print $1 "\n" $2 "\n+\n" $4 >"lane."arr[4]".R1.fastq" }'
ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Rm7.9k
3

I have a pure Awk solution that is much faster. Like the above solution, let's assume that the records are blocks of 4 lines:

awk 'BEGIN {FS = ":"} {lane=$4 ; print > "lane."lane".fastq" ; for (i = 1; i <= 3; i++) {getline ; print > "lane."lane".fastq"}}' < my.R1.fastq

Using the getline command 3 times, you can read blocks of 4 lines (from the standard input, hence the <).

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Frédéric Mahé3.0k

Thanks for this solution - I tried it and it works fast and nicely. I'm not familiar with awk, so could you please explain why your solution is faster please?

ADD REPLYlink written 3.5 years ago by DVA530

Totally rad. I love one-liners.

ADD REPLYlink written 6.4 years ago by Dan D7.0k

+1 for the paste

ADD REPLYlink written 6.4 years ago by Pierre Lindenbaum124k

I'm wondering, if it would be correct way to work with paired-end reads (not just a single fastq file)? Will the order be the same in the resulted files containing forward and reverse reads? Or may be there is a more safe solution for paired-end reads?

ADD REPLYlink written 8 months ago by Denis140
1

Are you referring to reads from multiple lanes in one file or just interleaved R1/R2 reads from a single lane?

It should be fine to use this solution as long as nothing else has been done to original files. You can do a quick check with repair.sh from BBMap suite after separating the files to make sure the read order is retained post-split.

ADD REPLYlink modified 8 months ago • written 8 months ago by genomax75k

Yes, i have two fastq files - one with forward and another with reverse reads. In each file there are reads from all 8-th Illumina lanes and i need to split their by lane so that order of reads in all resulted R1 files and correspondig R2 files be the same.

ADD REPLYlink written 8 months ago by Denis140
4
gravatar for Dan D
6.4 years ago by
Dan D7.0k
Tennessee
Dan D7.0k wrote:

enter image description here

See that highlighted "3" in the first line? That's the lane number in the FASTQ standard. If you read in your FASTQ file and direct your reads to different output files based on that value, you'll have different FASTQ files separated by lane.

Do you need help writing the script to do that?

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Dan D7.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1102 users visited in the last hour