Question: Using flowcell name as file name for FASTQ file
0
gravatar for Ric
3 months ago by
Ric190
Australia
Ric190 wrote:

I have two folders and each of them contain the same file names.

ls -1
10_S0_L001_R1_001.fastq.gz
10_S0_L001_R2_001.fastq.gz
11_S0_L001_R1_001.fastq.gz
11_S0_L001_R2_001.fastq.gz

Is there a way to extract from each dataset the flowcell name and use it as unique filename?

Thank you in advance.

next-gen sequence • 223 views
ADD COMMENTlink written 3 months ago by Ric190
0
gravatar for genomax
3 months ago by
genomax63k
United States
genomax63k wrote:

If you look inside the files then you should see headers that look something like

@HWI-EAS209_0006_FC706VJ:5:58:5894:21141

The flowcell serial is embedded in the header (e.g. FC706VJhere). Problem with using that for file names is all samples would have the same name unless you cat it to sample id with something like Sample_01_FC706VJ

ADD COMMENTlink written 3 months ago by genomax63k

Thank you, is there a script for it?

ADD REPLYlink modified 3 months ago • written 3 months ago by Ric190

You can use standard unix tools (such as cut, awk, tr, grep, etc) or non-standard ones (e.g.,bioawk) to extract metadata from your fastqs. Like genomax said, you'd need to extract some other identifier(s) in order to make your filenames unique.

ADD REPLYlink written 10 weeks ago by brendes20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1223 users visited in the last hour