Question: bwa mem -R (RGline) format for DNBseq fastq
0
gravatar for joe
28 days ago by
joe60
joe60 wrote:

I have a WGS dataset I'd like to align with bwa but the fastq files are in DNBseq format (shown below) and I'm stuck on the syntax. Does anyone how to change the RGline “@RG\tID:**\tLB:**\tPL:ILLUMINA\tSM:**\tPU:BARCODE” part of BWA mem -aMp -t #ofCPUs ref.fa -R “@RG\tID:**\tLB:**\tPL:ILLUMINA\tSM:**\tPU:BARCODE” > output.sam to make these work? Thanks.

Here's the top line of a fastq file in DNBseq format

@V300020594L1C001R0010000024/1

*Edited for clarity.

bwa fastq bwa mem dnbseq wgs • 216 views
ADD COMMENTlink modified 28 days ago • written 28 days ago by joe60
1

You are referring to -R option which is for read groups.

-R STR Complete read group header line. ’\t’ can be used in STR and will be converted to a TAB in the output SAM. The read group ID will be attached to every read in the output. An example is ’@RG\tID:foo\tSM:bar’.

If you don't have multiple samples (or don't plan to use a program that requires read groups) you should be able to omit that option. Note: If your downstream analysis requires read groups then you would need to use read groups.

Can you show us original fastq headers in your data?

ADD REPLYlink modified 28 days ago • written 28 days ago by genomax71k

I have multiple paired-end samples, which is why I was hoping to use -R. This is the header I was provided from the NGS company (BGI), which used DNBseq machines. The only documentation I could find from them shows the below.

FASTQ file sequenced by DNBseq.

@CL100072652L2C001R001_12/1
GCGACCCCAGGTCAGTCGGGACTACCCGCTGAAGTCGGAGGCCAAGCGGT
+

FFFCFFFFFFFFFDFEFFFFEFEF0FFFFEFFFFFFFEFFFFFECGFFFF

I'll try to parse in some colon delimiters and see if it helps(like this).

@V300020594:L1:C001:R001:0000024:1

Thanks.

ADD REPLYlink modified 28 days ago by genomax71k • written 28 days ago by joe60

Take a look at this page to see if you can use some of the examples to construct an appropriate -R line for your samples.

Something like

@RG\tID:CL1000\tLB:Library_ID\tPL:HELICOS\tSM:SAMPLE_ID\tPU:Index_seq

This bioRxiv paper says that an accompanying repo has code for converting BGI formatted fastq headers to Illumina format. But I am not able to find a link for the GitHub repo. Take a look around to see if you can find it.

ADD REPLYlink modified 28 days ago • written 28 days ago by genomax71k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 739 users visited in the last hour