Question: bwa mem -R (RGline) format for DNBseq fastq
1
gravatar for joe
11 months ago by
joe140
joe140 wrote:

I have a WGS dataset I'd like to align with bwa but the fastq files are in DNBseq format (shown below) and I'm stuck on the syntax. Does anyone how to change the RGline “@RG\tID:**\tLB:**\tPL:ILLUMINA\tSM:**\tPU:BARCODE” part of BWA mem -aMp -t #ofCPUs ref.fa -R “@RG\tID:**\tLB:**\tPL:ILLUMINA\tSM:**\tPU:BARCODE” > output.sam to make these work? Thanks.

Here's the top line of a fastq file in DNBseq format

@V300020594L1C001R0010000024/1

*Edited for clarity.

bwa fastq bwa mem dnbseq wgs • 769 views
ADD COMMENTlink modified 9 months ago by Biostar ♦♦ 20 • written 11 months ago by joe140
1

You are referring to -R option which is for read groups.

-R STR Complete read group header line. ’\t’ can be used in STR and will be converted to a TAB in the output SAM. The read group ID will be attached to every read in the output. An example is ’@RG\tID:foo\tSM:bar’.

If you don't have multiple samples (or don't plan to use a program that requires read groups) you should be able to omit that option. Note: If your downstream analysis requires read groups then you would need to use read groups.

Can you show us original fastq headers in your data?

ADD REPLYlink modified 11 months ago • written 11 months ago by genomax87k

I have multiple paired-end samples, which is why I was hoping to use -R. This is the header I was provided from the NGS company (BGI), which used DNBseq machines. The only documentation I could find from them shows the below.

FASTQ file sequenced by DNBseq.

@CL100072652L2C001R001_12/1
GCGACCCCAGGTCAGTCGGGACTACCCGCTGAAGTCGGAGGCCAAGCGGT
+

FFFCFFFFFFFFFDFEFFFFEFEF0FFFFEFFFFFFFEFFFFFECGFFFF

I'll try to parse in some colon delimiters and see if it helps(like this).

@V300020594:L1:C001:R001:0000024:1

Thanks.

ADD REPLYlink modified 11 months ago by genomax87k • written 11 months ago by joe140

Take a look at this page to see if you can use some of the examples to construct an appropriate -R line for your samples.

Something like

@RG\tID:CL1000\tLB:Library_ID\tPL:HELICOS\tSM:SAMPLE_ID\tPU:Index_seq

This bioRxiv paper says that an accompanying repo has code for converting BGI formatted fastq headers to Illumina format. But I am not able to find a link for the GitHub repo. Take a look around to see if you can find it.

ADD REPLYlink modified 11 months ago • written 11 months ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1823 users visited in the last hour