Question: bwa mem -R (RGline) format for DNBseq fastq
I have a WGS dataset I'd like to align with bwa but the fastq files are in DNBseq format (shown below) and I'm stuck on the syntax. Does anyone how to change the RGline “@RG\tID:**\tLB:**\tPL:ILLUMINA\tSM:**\tPU:BARCODE” part of BWA mem -aMp -t #ofCPUs ref.fa -R “@RG\tID:**\tLB:**\tPL:ILLUMINA\tSM:**\tPU:BARCODE” > output.sam to make these work? Thanks.

Here's the top line of a fastq file in DNBseq format


*Edited for clarity.

You are referring to -R option which is for read groups.

-R STR Complete read group header line. ’\t’ can be used in STR and will be converted to a TAB in the output SAM. The read group ID will be attached to every read in the output. An example is ’@RG\tID:foo\tSM:bar’.

If you don't have multiple samples (or don't plan to use a program that requires read groups) you should be able to omit that option. Note: If your downstream analysis requires read groups then you would need to use read groups.

Can you show us original fastq headers in your data?

I have multiple paired-end samples, which is why I was hoping to use -R. This is the header I was provided from the NGS company (BGI), which used DNBseq machines. The only documentation I could find from them shows the below.

FASTQ file sequenced by DNBseq.



I'll try to parse in some colon delimiters and see if it helps(like this).



Take a look at this page to see if you can use some of the examples to construct an appropriate -R line for your samples.

Something like


This bioRxiv paper says that an accompanying repo has code for converting BGI formatted fastq headers to Illumina format. But I am not able to find a link for the GitHub repo. Take a look around to see if you can find it.

