Question

Using flowcell name as file name for FASTQ file

0

Entering edit mode

5.4 years ago

Ric ▴ 430

I have two folders and each of them contain the same file names.

ls -1
10_S0_L001_R1_001.fastq.gz
10_S0_L001_R2_001.fastq.gz
11_S0_L001_R1_001.fastq.gz
11_S0_L001_R2_001.fastq.gz

Is there a way to extract from each dataset the flowcell name and use it as unique filename?

Thank you in advance.

sequence next-gen • 2.1k views

ADD COMMENT • link 5.4 years ago by Ric ▴ 430

score 0 · Answer 1 · 2018-12-02

0

Entering edit mode

5.4 years ago

GenoMax 141k

If you look inside the files then you should see headers that look something like

@HWI-EAS209_0006_FC706VJ:5:58:5894:21141

The flowcell serial is embedded in the header (e.g. FC706VJhere). Problem with using that for file names is all samples would have the same name unless you cat it to sample id with something like Sample_01_FC706VJ

ADD COMMENT • link 5.4 years ago by GenoMax 141k

0

Entering edit mode

Thank you, is there a script for it?

ADD REPLY • link 5.4 years ago by Ric ▴ 430

0

Entering edit mode

You can use standard unix tools (such as cut, awk, tr, grep, etc) or non-standard ones (e.g.,bioawk) to extract metadata from your fastqs. Like genomax said, you'd need to extract some other identifier(s) in order to make your filenames unique.

ADD REPLY • link 5.3 years ago by hylicase ▴ 20