Question

PacBio adapters in transcriptome assembly from short read data?

0

Entering edit mode

7 months ago

Dunois ★ 2.8k

Did PacBio ever have any platforms that did short read sequencing (< 300 bp)? I looked at both their Wikipedia page as well as the Wikipedia page on "Massive parallel sequencing" but I couldn't find any information in this regard.

The reason I am interested in this issue is because I have some (old) paired end sequencing (RNA-Seq) data from a non-model eukaryote (no genome available) that I revisited out of curiosity. The data were cleaned with fastp to ensure that adapters were removed, but NCBI's VecScreen indicated strong and pervasive contamination with PacBio adapters in the assembly? Here's an example:

File: <filename>, Code(VECTOR_MATCH), Sequence-id: TRINITY_DN2_c0_g1_i1, Interval: 310..336, This sequence has a Strong match on the following UniVec vector: gnl|uv|NGB01109.1:1-26 PacBio ULI gDNA amplification adapter

But these data were produced years ago and I do not believe PacBio even had a short read platform until recently (refer https://www.pacb.com/press_releases/pacbio-begins-commercialization-of-the-onso-short-read-sequencing-system/).

Does anyone have any ideas on what could be happening here?

adapter transcriptome sequencing pacbio vecscreen • 978 views

ADD COMMENT • link 6 months ago by Dunois ★ 2.8k

0

Entering edit mode

Did PacBio ever have any platforms that did short read sequencing

Not as you say in past but if someone used short PacBio reads that were not properly cleaned to remove the adapter (possible) then you would have this situation. Have you compared the sequence in question and adapter?

ADD REPLY • link 7 months ago by GenoMax 147k

0

Entering edit mode

I don't quite understand what you're trying to say. Can you please elaborate?

ADD REPLY • link 6 months ago by Dunois ★ 2.8k

0

Entering edit mode

There are short pacbio reads which seem to have the adapter in them (I can't say if the match is real or not). This sort of thing has happened in past: https://dgg32.medium.com/carp-in-the-soil-1168818d2191

ADD REPLY • link 6 months ago by GenoMax 147k

0

Entering edit mode

The matches are definitely to PacBio adapters. NCBI has confirmed this to me.

There are short pacbio reads which seem to have the adapter in them

Even if this is the case, I don't see why this would lead to VecScreen flagging PacBio adapters in my data which were produced on Illumina instruments at a time when PacBio did not even have a short read platform and the data had been processed by fastp to remove adapters prior to assembly?

ADD REPLY • link 6 months ago by Dunois ★ 2.8k

0

Entering edit mode

VecScreen is simply looking at the sequence match with its internal database that includes now known PacBio adapters. So it is possible that the "hit" is spurious, more so since we know that PacBio did not actually have a sequencer that produced short read data a few years ago. Trinity could have also misassembled the data into something that looks like PacBio adapter by chance.

If you feel like doing an experiment use bbduk.sh with its adatpers.fa file and see if this "hit" disappears. Perhaps fastp is leaving something behind that should be removed during trimming.

ADD REPLY • link 6 months ago by GenoMax 147k

0

Entering edit mode

Thank you GenoMax . I'll give bbduk.sh a try. And also, I've received updates regarding the sequencing that's provided me with additional adapter sequences to trim for. Apparently, some weird single cell protocol had been used here with bulk RNA-Seq data and there are additional adapters in the reads that we were never informed of. So you were very right in your assessment that fastp was probably overlooking sub-sequences.

I think with all of this information, I will be able to remove all adapters. I will update you here.

ADD REPLY • link 6 months ago by Dunois ★ 2.8k