Inferring undisclosed 5mer in proprietary SMARTer oligo sequence
1
0
Entering edit mode
9 months ago

I am attempting to infer the identify of an unknown 5mer present in amplified fragments after first-strand synthesis using the smart-seq v4 kit. takara oligo diagram.

I want to amplify fragments using this oligo from the original reverse-transcribed products before illumina library preparation.

I am using the shortread bioconductor package to sample ~1e6 reads from a few hundred untrimmed single cell fastq pairs, then filtering to exclude poly-A or poly-T sequences and listing the most frequent subsequent 5mers following the known oligo sequence, AAGCAGTGGTATCAACGCAGAGTAC. I am finding an overrepresentation of GGGNN sequences. Is there some explanation for this pattern? Something to do with C:G percentages and repetitive elements which I'm not dealing with through this naive approach?

frequency of 5mers

smart-seq scrnaseq • 413 views
ADD COMMENT
0
Entering edit mode
9 months ago
Pei ▴ 170

If the reads (your untrimmed single cell fastq) come from illumina machines, 'G' may simply represent no signal.

"...2-channel SBS simplifies nucleotide detection by using two fluorescent dyes and two images to determine all four base calls. Images are taken of each DNA cluster using blue and green wavelength filter bands. Clusters seen in blue or green images are interpreted as C and T bases, respectively. Clusters observed in both blue and green images are flagged as A bases, while unlabeled clusters are identified as G bases."

https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology/2-channel-sbs.html

ADD COMMENT

Login before adding your answer.

Traffic: 2848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6