Trouble removing adapters from sequences using cutadapt
1
0
Entering edit mode
17 days ago
k.lagan • 0

Hi all! I am new to the world of bioinformatics. I just received my first set of sequences and am having some issues removing primers and adapters using cutadapt. For reference, I used the 515F Parada and 806R Apprill adapters to amplify the V4 region. I was told from the sequencing facility that the adapter used was CTGTCTCTTATACACATCT. Below is the code I used to run cutadapt for my samples. When I run FastQC on the trimmed sequences, I am not seeing any adapter removal besides polyA removal. The adapter content <5%, but I was under the impression that you want this to be as close to zero as possible. Does anyone know what I am doing wrong?

for r1 in *_R1.fastq; do
    r2=${r1/_R1/_R2}
cutadapt --cores=4 \
    -g ^GTGYCAGCMGCCGCGGTAA \
    -G ^GGACTACNVGGGTWTCTAAT \
    -a CTGTCTCTTATACACATCT \
    -A CTGTCTCTTATACACATCT \
    -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
    -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
    -a A{10} \
    -A A{10} \
    --match-read-wildcards \
    --overlap 5 \
    --times 2 \
    --minimum-length 50 \
    --trim-n \
    --report=full \
    -o cutadapt_results/trimmed-${r1} \
    -p cutadapt_results/trimmed-${r2} \
    ${r1} ${r2} > cutadapt_results/cutadapt_report-${r1%.fastq}.txt 2>&1
done
removal cutadapt adapter primer • 676 views
ADD COMMENT
0
Entering edit mode

Below is the code I used to run cutadapt for my samples.

Is that part of a pipeline or something you came up yourself? If former, can you post a link for that?

ADD REPLY
0
Entering edit mode

A little bit of both. I used the cutadapt website (https://cutadapt.readthedocs.io/en/stable/recipes.html) to learn the syntax and parameters. The -g and -G were the primers I used (515F Parada and 806R Apprill). The first -a and -A are for the adapter provided to me by the sequencing facility. I added the second longer adapter because I saw "illuminia_universal'_adapter" on my FastQC report in the adapter content. I found the sequence online, but I do not think it is correct or I used it correctly because it failed to remove the "illuminia_universal_adapter". I also had polya artifacts, which were removed successfully with the -a A{10} -A{10}. I used the --match-read-wildcards because I have degenerate bases in my primers. I'm not sure if I need to use the reverse compliment for the 3' end though.

ADD REPLY
0
Entering edit mode

Update: I found the adapter list from FastQC (https://github.com/golharam/FastQC/blob/master/Configuration/adapter_list.txt) and used the sequences listed there to remove adapters using cutadapt with the updated script shown below:

for r1 in *_R1.fastq; do
    r2=${r1/_R1/_R2}
    cutadapt --cores=4 \
        -g ^GTGYCAGCMGCCGCGGTAA \
        -G ^GGACTACNVGGGTWTCTAAT \
        -a ATGGAATTCTCG \
        -A ATGGAATTCTCG \
        -a AGATCGGAAGAG \
        -A AGATCGGAAGAG \
        -a A{10} \
        -A A{10} \
        --match-read-wildcards \
        --overlap 5 \
        --times 2 \
        --minimum-length 50 \
        --trim-n \
        --report=full \
        -o cutadapt_results/trimmed-${r1} \
        -p cutadapt_results/trimmed-${r2} \
        ${r1} ${r2} > cutadapt_results/cutadapt_report-${r1%.fastq}.txt 2>&1
done

I am still seeing overrepresented sequences in my post-cutadapt FastQC report. Could this mean the primers are not being removed? I have the --times 2 in attempts to remove primers after adapter removal, but I am not sure this is what I am supposed to do.

ADD REPLY
0
Entering edit mode

I am still seeing overrepresented sequences

Isn't that expected since you are working with 16S (since the sequences are identical they would be "overrepresented").

Don't be concerned with FastQC report. Test intervals set for various tests are for genomic sequence (which is not your case). Move on with the analysis and if there is some notable issue then come back and try to diagnose.

ADD REPLY
0
Entering edit mode
12 days ago
chen ★ 2.5k

I suggest you to use fastp, which is much faster, simpler, more accurate with more functions. fastp is still being actively developed.

https://github.com/OpenGene/fastp

If you want to run fastp for multiple FASTQ data in parallel, especially for paired-end data, you can use

https://github.com/OpenGene/parallel

ADD COMMENT

Login before adding your answer.

Traffic: 2306 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6