Did PacBio ever have any platforms that did short read sequencing (< 300 bp)? I looked at both their Wikipedia page as well as the Wikipedia page on "Massive parallel sequencing" but I couldn't find any information in this regard.
The reason I am interested in this issue is because I have some (old) paired end sequencing (RNA-Seq) data from a non-model eukaryote (no genome available) that I revisited out of curiosity. The data were cleaned with fastp
to ensure that adapters were removed, but NCBI's VecScreen
indicated strong and pervasive contamination with PacBio adapters in the assembly? Here's an example:
File: <filename>, Code(VECTOR_MATCH), Sequence-id: TRINITY_DN2_c0_g1_i1, Interval: 310..336, This sequence has a Strong match on the following UniVec vector: gnl|uv|NGB01109.1:1-26 PacBio ULI gDNA amplification adapter
But these data were produced years ago and I do not believe PacBio even had a short read platform until recently (refer https://www.pacb.com/press_releases/pacbio-begins-commercialization-of-the-onso-short-read-sequencing-system/).
Does anyone have any ideas on what could be happening here?
Not as you say in past but if someone used short PacBio reads that were not properly cleaned to remove the adapter (possible) then you would have this situation. Have you compared the sequence in question and adapter?
I don't quite understand what you're trying to say. Can you please elaborate?
There are short pacbio reads which seem to have the adapter in them (I can't say if the match is real or not). This sort of thing has happened in past: https://dgg32.medium.com/carp-in-the-soil-1168818d2191
The matches are definitely to PacBio adapters. NCBI has confirmed this to me.
Even if this is the case, I don't see why this would lead to VecScreen flagging PacBio adapters in my data which were produced on Illumina instruments at a time when PacBio did not even have a short read platform and the data had been processed by
fastp
to remove adapters prior to assembly?VecScreen is simply looking at the sequence match with its internal database that includes now known PacBio adapters. So it is possible that the "hit" is spurious, more so since we know that PacBio did not actually have a sequencer that produced short read data a few years ago. Trinity could have also misassembled the data into something that looks like PacBio adapter by chance.
If you feel like doing an experiment use
bbduk.sh
with itsadatpers.fa
file and see if this "hit" disappears. Perhapsfastp
is leaving something behind that should be removed during trimming.Thank you GenoMax . I'll give
bbduk.sh
a try. And also, I've received updates regarding the sequencing that's provided me with additional adapter sequences to trim for. Apparently, some weird single cell protocol had been used here with bulk RNA-Seq data and there are additional adapters in the reads that we were never informed of. So you were very right in your assessment thatfastp
was probably overlooking sub-sequences.I think with all of this information, I will be able to remove all adapters. I will update you here.