I have paired-end Illumina reads with barcode and primer sequence. Barcode and primer sequence are just in .txt file. The experiment was following: Primer was used for PCR and then they hanged the experiment tag (barcode) and the adapter. So, the read are following:
I want to demultiplex the reads according to the barcode_sequence and then cut off the primer sequence. Till now I have tried following:
I do not have the barcode read fastq files, I have only the sequences of barcode and primers. I contracted the mapping file:
#SampleID BarcodeSequence LinkerPrimerSequence Description 1 TCGCAGG AACCTGGTTGATCCTGCCAGT C4363F2_18.7. 2 CTCTGCA AACCTGGTTGATCCTGCCAGT C4363F2_19.7.
So, I need to define -barcode_type not-barcoded. It showed me an error that I need to specify --sample ids, as I had only one input fiel, I have only one sample id
split_libraries_fastq.py -m mapping.txt -i Pool1_18S.fastq -o demultiplexed_output/ --barcode_type not-barcoded --sample_ids 1
I get one seqs.fna file where all reads have attached following:
orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
process_radtags -p /fastq -I -b /mapping_radtags.txt --inline_inline -o /demultiplexed_output
However, it asks me to specify the restriction enzyme used. But I do not have this information.
So, what I need: I have several experiments identified by barcode. I need to demultiplex it. I cannot just search for the barcode in the sequence and say that this sequence belongs to the experiment. It can happen that there is a sequencing error in the barcode, so that I need to define a hamming (or any other) distance between the real barcode sequence and the sequence in the read. Which program can do this?