Question: Parsing Fastq Files
gravatar for deepthitheresa
7.9 years ago by
deepthitheresa20 wrote:

Hi all,

I have Fastq reads something like

@HWI-ST1162:73:C0KEFACXX:6:1101:1816:1918 1:N:0:CGATGT

I aligned this fastq file with a reference genome using bowtie. How can I identify the sample name from this record?

I have demultiplexed fastq files for each sample and I also have barcode information file in the format

sample name    Index sequence
BC1                  CGATGT
BC2                  CGATGA

When I try to retrieve the alignment information using $sam->features() the seqID will be returned as


How can I get the 1:N:0:CGATGT part from the alignment information?

Thanks, Deeps

fastq parsing • 3.2k views
ADD COMMENTlink modified 7.9 years ago by jingtao09110 • written 7.9 years ago by deepthitheresa20
gravatar for Sean Davis
7.9 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

I'd suggest that you use SAM Read Groups to track samples. This would be done at the alignment stage....

ADD COMMENTlink written 7.9 years ago by Sean Davis26k

Good suggestion. It helped me a lot

ADD REPLYlink written 7.6 years ago by deepthitheresa20
gravatar for jingtao09
7.9 years ago by
jingtao09110 wrote:

If you want to keep the barcode in SAM file, you can add a non-space character in between the main header and the barcode section.

@HWI-ST1162:73:C0KEFACXX:6:1101:1816:1918 1:N:0:CGATGT

to be


here I used a colon ":", so if you parse this header, you can use split function to get the Python


Normally, most of the mapper, i.e BWA or BOWTIE will truncate the header name after a space. so if you preprocess your FASTQ file into this new format you will save alot time. Otherwise, if you are not able to do the modification on the FASTQ reads, you can open the original FASTQ file and SAM file at same time to calibrate the line numbers and parse out the barcode.

ADD COMMENTlink modified 7.9 years ago by Istvan Albert ♦♦ 83k • written 7.9 years ago by jingtao09110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1273 users visited in the last hour