Question: Using flowcell name as file name for FASTQ file
0
gravatar for Ric
9 months ago by
Ric280
Australia
Ric280 wrote:

I have two folders and each of them contain the same file names.

ls -1
10_S0_L001_R1_001.fastq.gz
10_S0_L001_R2_001.fastq.gz
11_S0_L001_R1_001.fastq.gz
11_S0_L001_R2_001.fastq.gz

Is there a way to extract from each dataset the flowcell name and use it as unique filename?

Thank you in advance.

next-gen sequence • 471 views
ADD COMMENTlink written 9 months ago by Ric280
0
gravatar for genomax
9 months ago by
genomax71k
United States
genomax71k wrote:

If you look inside the files then you should see headers that look something like

@HWI-EAS209_0006_FC706VJ:5:58:5894:21141

The flowcell serial is embedded in the header (e.g. FC706VJhere). Problem with using that for file names is all samples would have the same name unless you cat it to sample id with something like Sample_01_FC706VJ

ADD COMMENTlink written 9 months ago by genomax71k

Thank you, is there a script for it?

ADD REPLYlink modified 9 months ago • written 9 months ago by Ric280

You can use standard unix tools (such as cut, awk, tr, grep, etc) or non-standard ones (e.g.,bioawk) to extract metadata from your fastqs. Like genomax said, you'd need to extract some other identifier(s) in order to make your filenames unique.

ADD REPLYlink written 8 months ago by brendes20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1961 users visited in the last hour