Hello, I'm fairly new to bioinformatics but I would like to ask you a question about identification of barcodes. I have a file which contains the results of a multiplex sequencing. I have to demultiplex a dataset without having the barcodes.
The question is how to identify the barcodes (3' adapters) which were used, to further identify the number of sequences that were sequenced from each sample? What algorithms can I use to solve this problem?
What kind of sequencing is this? Illumina/pacbio? In case of standard illumina multiplexing, adapter sequences are read independently and never part of main reads. You may find deML useful if you have custom indexes.
Thank you for your response. This is a very simplified example, because I only have multiplexed sequences in the file. Their format looks something like this:
{sequence} {barcode} {3 'adapter},
where the 3' adapter is the same for all sequences in this file, they only differ in barcode. My task is to find all the barcodes that appear in the sequences.