Unable to find adapter seq using Fastp
1
0
Entering edit mode
3.8 years ago

Hi!

We are doing riboseq analysis on some fastq files. We are using Fastp for adapter identification and removal. But, we are facing some problems. fastp is able to remove the adapter sequences only if the adapter seq is given in the command line as an argument. It is not able to detect the adapter and remove it by itself and is giving the output as adapter is not detected. The sra file used, adapter seq and codes are as follows:

SRA file : SRR810103 (fastq size 31.6GB)

./fastp -i ./SRR810103.fastq -a TCGTATGCCGTCTTCTGCTTG -q 33 -w 10 -o./output.fastq

./fastp -i ./SRR810103.fastq -w 10 -o ./output.fastq


sequencing sequence assembly software error fastp • 3.6k views
2
Entering edit mode

Have you considered the possibility that the data is already trimmed. This may be especially true if the reads are not all equal length.

As an alternative you could try bbduk.sh from BBMap suite with literal=your_adapter_seq option. A guide for BBduk is available here.

0
Entering edit mode

Thanks for your suggestion. Actually all the reads in the file have a length of 51 nucleotides, hence i do not think that data has already been trimmed of the adapters. (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR810103) Also i want to use the automatic adapter detection feature of fastp so that i can generalize my workflow for any ribo-seq data sets, therefore I want to use fastp so that it automatically detects and removes if any adapters are present.

0
Entering edit mode

you can try trim_galore, it will satisfiy.

0
Entering edit mode

Hi,

I'm meeting the same problem. The paper didn't provide the adapter sequence. Is there a effective tool to detect adapters automatically, could you help me solve this problem?

Thank you very much!

0
Entering edit mode

If you can find out what kit was used for preparing the libraries then it may be simple to use the adapter for that kit. Your data need not necessarily have adapter sequences so unless you are planning to do de novo assemblies you could let aligners take care of any adapter sequences by soft-clipping at time of alignment.

0
Entering edit mode

Run fastqc to see whether and which adapter there is, then google for the sequence and provide it to the tool.

0
Entering edit mode

Thanks! The fastqc result shows no adapter. fastqc didn't work.

0
Entering edit mode

If it shows no adapter then there are none and you do not need any trimming. If (unikely) there is an adapter that fastqc does nt recognize then it should still show up as a overrepresented sequence. If there are neither overrepresented sequences nor detected adapters then the data are good to be aligned right away without further processing I'd say.

0
Entering edit mode

fastp does have an automatic adapter detection option (I believe this is its default behavior).

2
Entering edit mode
15 months ago

One way to probe the adapter content is to slice and group the ends of the reads. I do this often as a quick sanity check. It is a simple way to detect possible systematic contamination that starts at a give coordinate. Get a dataset (251bp long) reads:

fastq-dump -X 100000 SRR519926 --split-files


For example to see if there are 30 bp long common sequences starting at base 210 you could do a

cat SRR519926_1.fastq | bioawk -c fastx '{print substr(\$seq, 210, 30) }' | sort | uniq -c | sort -rn | head


the command above uses bioawk but you can use regular awk was well then remove the nonsequence related entries, the output is:

  47 TCGGAAGAGCACACGTCTGAACTCCAGTCA
47 GGAAGAGCACACGTCTGAACTCCAGTCACG
45 ACACGTCTGAACTCCAGTCACGTAACATCA
41 GATCGGAAGAGCACACGTCTGAACTCCAGT
37 ACTCCAGTCACGTAACATCATCTCGTATGC
36 GAGCACACGTCTGAACTCCAGTCACGTAAC
36 CAGTCACGTAACATCATCTCGTATGCCGTC
36 CACGTCTGAACTCCAGTCACGTAACATCAT
36 AGATCGGAAGAGCACACGTCTGAACTCCAG
36 AGAGCACACGTCTGAACTCCAGTCACGTAA


A little eyeballing tells us the most sequences are overlapping and appear to be different substrings of a much longer adapter sequences. For a more "proper" solution, extract and align the ends to see if these sequences overlap.