Hi. Im trying to demultiplex a dataset but im having a couple of problems with my files. I have reallized that each pair of barcodes associated to each ID arent in the header but in the sequence. I have tried softwares like demuxbyname AND deML but these programs work with the header barcodes. I have 115 ID's, so im supposed to get 230 fastq files (forward and reverse) but when i get manually the barcodes of the header im getting more than 200 pair of barcodes. When i run demuxbyname with these barcodes im getting near 500 files, but of these 500 files there are about 230 that have a considerable size so I think that there are the files that I need. I think that the barcodes associated with the IDs are in the sequence and not in the header so im looking for a software able to demultiplex based on the sequence. I have paired end reads. I have seen the FASTX- barcode splitter but im not sure if this software allow to demultiplex paired end reads beacause the txt example file that have the barcodes only have 1 barcode per ID. Any help is appreciated. Thanks
Hi genomax, thanks for your answer. Here is an example of what you ask me. for R1:
for R2:
So your index sequences are already in fastq headers (
TAATTCGT+ATAGAGGC
). There is nothing else to identify from your main read sequences.You just need to use known index pairs with
demuxbyname.sh
. You can usehdist=1
option to allow one error in index sequences to recover additional data. e.g.AATTCGT+ATAGAGGC == A*G*TTCGT+ATAGAGGC
would be considered equivalent.Edit: Looks like I had already answered a similar question from you a few days back: C: demuxbyname.sh output help Please don't post similar content in multiple questions. It duplicates effort for you as well as others.