Clarification on UMI Length in FASTQ File after Trimming with fastp
0
0
Entering edit mode
14 days ago
daffodil ▴ 10

I have a question regarding the UMI handling in my sequencing data after using fastp.

I ran fastp to trim adapters and append a 6-base UMI to my reads using the following configuration. However, when I inspect the output FASTQ file, I observe a 12-base sequence in the UMI field. Here is an example of a read from the FASTQ file:

@VH00349:206:AACMHWCHV:1:1101:18610:1000:CGTGGT_AGAGCG 2:N:0:CAGATC
CTACCACGGCTCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
9I9IIIIII-II99IIIIIIIIIIIII-I9IIIII9I9I9II9IIIIIII9IIII-9IIII9I-III--9I9III-III-I9II9I9I9II999I9IIIIIIIIII9II

As you can see, the UMI in this case appears as CGTGGT_AGAGCG, which consists of 12 nucleotides (6 bases in two parts). However, I had expected to see only 6 bases based on the UMI length I specified during trimming.

Could you please help clarify why the UMI appears as 12 bases in this case and whether this is due to the fastp settings or the sequencing process?

umi adapter • 189 views
ADD COMMENT
0
Entering edit mode

Include the command you used for this operation.

ADD REPLY
0
Entering edit mode
for sample in "${samples[@]}"; do
  echo "Running fastp for $sample"

  fastp \
    -i ${sample}_R1.fastq.gz \
    -I ${sample}_R2.fastq.gz \
    --umi \
    --umi_loc=per_read \
    --umi_len=6 \
    -o ${sample}_R1_trimmed.fastq.gz \
    -O ${sample}_R2_trimmed.fastq.gz \
    --html ${sample}_fastp.html \
    --json ${sample}_fastp.json \
    -w 40
done
ADD REPLY

Login before adding your answer.

Traffic: 1274 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6