Question: Searching for the conserved pattern containing a barcode of various length
0
gravatar for mazepago
4.2 years ago by
mazepago0
mazepago0 wrote:

Hi all,

I wanna to look for a pattern in the sequence, that would contain a conservative flanks and a wildcard piece inside of variable length.

In particular, I am checking the RADseq paired end data and looking for the short loci aiming to trim off the ligation_adapter from R1 and the cut_site_1-barcode-ligation_adapter from the R2.

Such reads  look like this:

R1: cut_site_1-NNNNNNNNNN-cutsite_2-ligation_adapter

R2: cut_site_2-NNNNNNNNNN-cut_site_1-barcode-ligation_adapter

The problem is with trimming of R2: there is a conserved cut_site_1 & ligation_adapter sequences, but also there are 96 different types of barcodes, which sequence can be 4-8 bp long. I think I should use the wildcard, but how to specify a wildcard with varying length at the same time?

Glib

 

 

My

 

 

sequence • 915 views
ADD COMMENTlink modified 4.2 years ago by Daniel3.8k • written 4.2 years ago by mazepago0
2
gravatar for Daniel
4.2 years ago by
Daniel3.8k
Cardiff University
Daniel3.8k wrote:

A regex like this should work if I understand you correctly, and will look for A, C, T or G repeated between 4 and 8 times. 

cut_site_1-[ACTG]{4,8}-cutsite_2-ligation_adapter

 You can tweak the visual representation of this here (great tool!)

 

 

ADD COMMENTlink written 4.2 years ago by Daniel3.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1219 users visited in the last hour