Question: How to avoid outputting untrimmed second reads when demultiplexing paired-end reads with cutadapt? ("--untrimmed-paired-output" argument seems not to work)
0
gravatar for ajfaure
23 months ago by
ajfaure0
ajfaure0 wrote:

Hi,

I am attempting to demultiplex barcoded 100bp paired-end illumina short-read sequencing data with cutadapt following these instructions: https://cutadapt.readthedocs.io/en/stable/guide.html#demultiplexing

I only want to retain read pairs where the barcodes were found and trimmed/removed in both reads of a pair.

However cutadapt is outputting untrimmed second reads despite having specified the "--untrimmed-paired-output" argument.

Full details of my analysis are as follows (cutadapt 1.17 with Python 3.6.5):

Command-line parameters:

cutadapt -g file:demultiplex_barcode-file_1.fasta -G file:demultiplex_barcode-file_1.fasta -e 0.25 --no-indels --untrimmed-output Input_1.fastq.gz.demultiplex.unknown.fastq --untrimmed-paired-output Input_2.fastq.gz.demultiplex.unknown.fastq -o {name}1.fastq -p {name}2.fastq Input_1.fastq.gz Input_2.fastq.gz

demultiplex_barcode-file_1.fasta:

>Input_Rep1_read
^GCCGAATT
>Input_Rep2_read
^CGGCAATT
>Input_Rep3_read
^GAACGTTC

Before cutadapt demultiplexing (example read1 FASTQ entry in Input_1.fastq.gz):

@D3FCO8P1:231:C49LBACXX:7:1101:2911:2101 1:N:0:
GCCGAATTTGCAGTTTGAACAAAGCAAGAACTTACCCCAAACAATTAGTGGAATTGGCAAAAGAAGAAGACAAAGCCACCCCAAGTTAGATTTCGATCCT
+
CCCFFFFFHHHHHJJJJJJJJJJJJJIJJIIJJJIJJJJJIIIIIJIJGGIIIIJGIGGGIJIIJCGHEHEC@D@CA@C>BDDDDCDCCCCDCDCBD?@>

Before cutadapt demultiplexing (example read2 FASTQ entry in Input_2.fastq.gz):

@D3FCO8P1:231:C49LBACXX:7:1101:2911:2101 2:N:0:
CCGAATTAAAATGTCCAATGTTCCAACCTACAGGATCGAAATCTAACTTGGGGTGGCTTTGTCTTCTTCTTTTGCCAATTCCACTAATTGTTTGGGGTAA
+
CCCFFFFFHHHHHJJJJJJIJJJJJJIJJJIJJJJJJJJIJJJJGHGIJIIJJIIDBE@GGHFFFHDHFFCFFEA6>;@ACCACC;;>>:>CCA@BBDB3

After cutadapt demultiplexing (example read1 FASTQ entry in Input_Rep1_read1.fastq):

@D3FCO8P1:231:C49LBACXX:7:1101:2911:2101 1:N:0:
TGCAGTTTGAACAAAGCAAGAACTTACCCCAAACAATTAGTGGAATTGGCAAAAGAAGAAGACAAAGCCACCCCAAGTTAGATTTCGATCCT
+
HHHHHJJJJJJJJJJJJJIJJIIJJJIJJJJJIIIIIJIJGGIIIIJGIGGGIJIIJCGHEHEC@D@CA@C>BDDDDCDCCCCDCDCBD?@>

After cutadapt demultiplexing (example read2 FASTQ entry in Input_Rep1_read2.fastq):

@D3FCO8P1:231:C49LBACXX:7:1101:2911:2101 2:N:0:
CCGAATTAAAATGTCCAATGTTCCAACCTACAGGATCGAAATCTAACTTGGGGTGGCTTTGTCTTCTTCTTTTGCCAATTCCACTAATTGTTTGGGGTAA
+
CCCFFFFFHHHHHJJJJJJIJJJJJJIJJJIJJJJJJJJIJJJJGHGIJIIJJIIDBE@GGHFFFHDHFFCFFEA6>;@ACCACC;;>>:>CCA@BBDB3

As you can see, the barcode was not matched in the second read but both reads nevertheless still appear in the supposedly trimmed output files (Input_Rep1_read1.fastq and Input_Rep1_read2.fastq).

I have tried specifying "--pair-filter=any" although this is the default setting. Neither specifying "any" nor "both" makes any difference to this read pair being retained despite the second read being untrimmed.

Any help would be appreciated!

Thanks,

Andre

ADD COMMENTlink modified 23 months ago • written 23 months ago by ajfaure0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1266 users visited in the last hour