Question

Barcode Fastq Header - Adding Characters

0

Entering edit mode

4.4 years ago

zach ▴ 10

I'm new with Linux and would appreciate any help with this.

My forward fastq file has this type of header line for each sample: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0:CGTCGTATGAAT

But my barcode fastq file only has:

@DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870

This is not allowing me to demultiplex with qiime2. I'm assuming the solution is to add '1:N:0' to the header lines of the barcode fastq for each sample. How do I do this on the command line with Linux?

Thank you!

sequencing • 3.1k views

ADD COMMENT • link 4.4 years ago by zach ▴ 10

1

Entering edit mode

If you just need 1:N:0 then you could use reformat.sh from BBMap suite.

addcolon=f              Append ' 1:' and ' 2:' to read names, if not already present.

ADD REPLY • link 4.4 years ago by GenoMax 141k

0

Entering edit mode

try with small fastq file: @ zach

$ sed '1~4 s/$/ 1:N:0:CGTCGTATGAAT/g' test.fq

Just to add 1:N:0 try:

$ sed '1~4 s/$/ 1:N:0/g' test.fq

ADD REPLY • link 4.4 years ago by cpad0112 21k

score 1 · Answer 1 · 2019-12-09

1

Entering edit mode

4.4 years ago

Pierre Lindenbaum 161k

 awk '{print $0 (NR%4==1?" 1:N:0:CGTCGTATGAAT/1":"")}' in .fq

ADD COMMENT • link 4.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Hi all. Thank you so much for the commands and quick help - I used 'awk' and it worked to make the barcode header lines: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0

However, when demux-ing, I received another error message for mismatched seq description: N:0, N:0:CGTCGTATGAAT, and N:0:CGTCGTATGAAT

I think it's because of the index sequence in the headers of the forward and reverse header lines. Eg.

Forward: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0:CGTCGTATGAAT

Reverse: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 2:N:0:CGTCGTATGAAT

To make matching descriptions for all 3 files and samples within them, what command line could I use to replace "1:N:0:CGTCGTATGAAT" (forward) and "2:N:0:CGTCGTATGAAT" (reverse) with just "1:N:0" (barcode) ?

I appreciate your time and effort in helping me and making this forum amazing.

ADD REPLY • link 4.4 years ago by zach ▴ 10

1

Entering edit mode

Since you have index data in a separate file, set the fastq header to 1:N:0:CGTCGTATGAAT in that file. As long as you have just one index.

ADD REPLY • link 4.4 years ago by GenoMax 141k

0

Entering edit mode

The sequence index in the header for all the other samples are different, eg:

@DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0:CGTCGTATGAAT

vs

@DGZN8DQ1:549:H7C23BCXX:2:1101:1126:1870 1:N:0:TTTGCATCAGGG

vs

@DGZN8DQ1:549:H7C23BCXX:2:1101:1189:1870 1:N:0:CCGTCTATGTTT

and so on corresponding to the barcodes. I'm guessing I have 2 options and it would be great to try them out if one doesn't work:

a) make all (barcodes, forward, reverse) header descriptions "1:N:0"

b) add description of index sequence (already in forward, reverse) to the barcodes

If I follow option b which you suggested, how do I write a command line for this in Linux? Thanks!

ADD REPLY • link 4.4 years ago by zach ▴ 10

0

Entering edit mode

Do you have separate files for each sample or are these indexes all in one set of files. R1,R2 and I1?

ADD REPLY • link 4.4 years ago by GenoMax 141k

0

Entering edit mode

All the samples are in R1. The same is for R2, and I1 as well.

ADD REPLY • link 4.4 years ago by zach ▴ 10

0

Entering edit mode

Did deML not work then? A: Demultiplexing Illumina data should be the best solution.

ADD REPLY • link 4.4 years ago by GenoMax 141k

0

Entering edit mode

Thanks. I have looked at the post you linked and have tried using deML. I ran into several errors and am currently discussing with the developer about this

ADD REPLY • link 4.4 years ago by zach ▴ 10

0

Entering edit mode

deML does not seem to work with my data for some reason. It seems that I should still attempt to add the index sequence to the header of each sample, within the barcode file.

ADD REPLY • link 4.4 years ago by zach ▴ 10