split concatenated amplicons at the primer sequence
0
0
Entering edit mode
4.1 years ago
Niwatori • 0

Hi,

I received a fastq file, long range PCR, long read sequencing (read length up to 64kb). The longest reads seem to be two or more amplicons that were ligated during library prep. I'm looking for an efficient way to scan every read in the fastq and split reads at the ligation points and build a new file containing single amplicons. Single amplicons look like this:

index1-Fprimer-sequence-Rprimer-index2,

so ligated amplicons look something like:

index1-primer-sequence-primer-index2-index3-primer-sequence-primer-index4-

(and so on depending on number of ligated amplicons). Primers are the same throughout the fastq (1 gene, long read sequencing over the whole gene), but multiple samples were pooled, hence the different indexes. The output file doesn't necessarily have to bin them according to sample, although that would be nice. I can always run minibar demultiplexing on the output file, once the amplicons are separated.

Does anyone know if there's a tool available that can do this in a efficient manner or do I write something myself. If so, any ideas on strategy? I was thinking of making a list of possible index-primer combinations (allowing for mismatches of course) and using str.split or something along that line. Any help appreciated. :)

long read NGS ligation • 708 views
ADD COMMENT

Login before adding your answer.

Traffic: 1763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6