How to demultiplex a pooled fastq sequence file and extract each sample sequences
1
0
Entering edit mode
12 months ago
rishav513 ▴ 30

Hello all,

I have a pooled sequence file named "ERR1806550_1.fastq.gz" containing single-end sequences. Now, I want to demultiplex this sequence file and extract 37 sample sequences of my interest from it. These are the barcode sequences of those 37 samples that I want to extract from this "ERR1806550_1.fastq.gz" file :

AACACC
AACCAG
AACGGA
CCGTTA
CCTAAG
CCTTCT
CGAGTT
GAACGT
GACTTC
GAGTCA
GATGAC
GCACTA
GCCATT
GCTTGA
GGAGAA
GGATCT
GTAACC
TATGCG
AAGAGG
AAGCCT
AAGGTC
TCAGAG
TCCTTG
TCGACT
AATCGC
ACAACG
ACCGAT
TCTAGC
TGACCA
TGGAAG
ACCTCA
CAACTC
CAAGCA
TGGTGA
TGTGTC
TTCCGT

So, far I used this script to extract the sample sequences:

grep -B1 -A2 "^AGCACTGTAG" file.fastq | grep -v "^--$" > out.fq

but after extracting the sample sequences and processing them in dada2, it is showing an error:

Error in add(bin) : record does not start with '@'

but I checked every file, each one started with @

I think maybe I am not able to demultiplex the file correctly, Kindly help regarding this concern.

fastq demultiplexing files • 742 views
ADD COMMENT
0
Entering edit mode

Where are these barcodes located? Are they within the actual sequence or are they in the header?

ADD REPLY
0
Entering edit mode

Sorry, actually they are within the actual sequences

ADD REPLY
1
Entering edit mode
12 months ago
GenoMax 142k

Use demuxbyname.sh from BBMap suite.

See in-line help.

demuxbyname.sh in=<file> out=<outfile> delimiter=: prefixmode=f
This will split on colons, and use the last substring as the name; useful for
demuxing by barcode for Illumina headers in this format:
@A00178:73:HH7H3DSXX:4:1101:13666:1047 1:N:0:ACGTTGGT+TGACGCAT

out=<file>      Output files for reads with matched headers (must contain % symbol).
                For example, out=out_%.fq with names XX and YY would create out_XX.fq and out_YY.fq.
                If twin files for paired reads are desired, use the # symbol.  For example,
                out=out_%_#.fq in this case would create out_XX_1.fq, out_XX_2.fq, out_YY_1.fq, etc.
ADD COMMENT
0
Entering edit mode

ok, thank you

ADD REPLY

Login before adding your answer.

Traffic: 1959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6