Question: Barcode Fastq Header - Adding Characters
0
gravatar for zach
7 months ago by
zach0
zach0 wrote:

I'm new with Linux and would appreciate any help with this.

My forward fastq file has this type of header line for each sample: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0:CGTCGTATGAAT

But my barcode fastq file only has:

@DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870

This is not allowing me to demultiplex with qiime2. I'm assuming the solution is to add '1:N:0' to the header lines of the barcode fastq for each sample. How do I do this on the command line with Linux?

Thank you!

sequencing • 370 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by zach0
1

If you just need 1:N:0 then you could use reformat.sh from BBMap suite.

addcolon=f              Append ' 1:' and ' 2:' to read names, if not already present.
ADD REPLYlink written 7 months ago by genomax85k

try with small fastq file: @ zach

$ sed '1~4 s/$/ 1:N:0:CGTCGTATGAAT/g' test.fq

Just to add 1:N:0 try:

$ sed '1~4 s/$/ 1:N:0/g' test.fq
ADD REPLYlink modified 7 months ago • written 7 months ago by cpad011213k
1
gravatar for Pierre Lindenbaum
7 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:
 awk '{print $0 (NR%4==1?" 1:N:0:CGTCGTATGAAT/1":"")}' in .fq
ADD COMMENTlink written 7 months ago by Pierre Lindenbaum129k

Hi all. Thank you so much for the commands and quick help - I used 'awk' and it worked to make the barcode header lines: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0

However, when demux-ing, I received another error message for mismatched seq description: N:0, N:0:CGTCGTATGAAT, and N:0:CGTCGTATGAAT

I think it's because of the index sequence in the headers of the forward and reverse header lines. Eg.

Forward: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0:CGTCGTATGAAT

Reverse: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 2:N:0:CGTCGTATGAAT

To make matching descriptions for all 3 files and samples within them, what command line could I use to replace "1:N:0:CGTCGTATGAAT" (forward) and "2:N:0:CGTCGTATGAAT" (reverse) with just "1:N:0" (barcode) ?

I appreciate your time and effort in helping me and making this forum amazing.

ADD REPLYlink modified 7 months ago • written 7 months ago by zach0
1

Since you have index data in a separate file, set the fastq header to 1:N:0:CGTCGTATGAAT in that file. As long as you have just one index.

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax85k

The sequence index in the header for all the other samples are different, eg:

@DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0:CGTCGTATGAAT

vs

@DGZN8DQ1:549:H7C23BCXX:2:1101:1126:1870 1:N:0:TTTGCATCAGGG

vs

@DGZN8DQ1:549:H7C23BCXX:2:1101:1189:1870 1:N:0:CCGTCTATGTTT

and so on corresponding to the barcodes. I'm guessing I have 2 options and it would be great to try them out if one doesn't work:

a) make all (barcodes, forward, reverse) header descriptions "1:N:0"

b) add description of index sequence (already in forward, reverse) to the barcodes

If I follow option b which you suggested, how do I write a command line for this in Linux? Thanks!

ADD REPLYlink modified 7 months ago • written 7 months ago by zach0

Do you have separate files for each sample or are these indexes all in one set of files. R1,R2 and I1?

ADD REPLYlink written 7 months ago by genomax85k

All the samples are in R1. The same is for R2, and I1 as well.

ADD REPLYlink written 7 months ago by zach0

Did deML not work then? A: Demultiplexing Illumina data should be the best solution.

ADD REPLYlink written 7 months ago by genomax85k

Thanks. I have looked at the post you linked and have tried using deML. I ran into several errors and am currently discussing with the developer about this

ADD REPLYlink written 7 months ago by zach0

deML does not seem to work with my data for some reason. It seems that I should still attempt to add the index sequence to the header of each sample, within the barcode file.

ADD REPLYlink written 7 months ago by zach0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1759 users visited in the last hour