Hi, i have a question about the demultiplexing output with demuxbyname. I have demultiplexed a dataset recently with demuxbyname from the bbmap library. i have noticed that there's a lot of files in the output, some of them are large files and otrher are small files. Within my dataset, i have 120 ID's, 2 barcodes per ID ( so im supposed to get 240 files, 2 files per ID, the forward and reverse if im not mistaken). But when i get the index from my dataset, i get more than 240 barcodes ( im getting like 400 barcodes, 200 lines (2 barcodes per line)). So i have a couple of questions: 1) when i get the indexes from my dataset, why im getting more than the 240 barcodes im supossed to get ? (like i said, i have 120 ID's, so its supossed to have 2 barcodes per ID) 2) Once i have demultiplexed my dataset, how can i know which files are the right ones and which files i have to delete? i was thinking in mapping each pair of fastq files, but i dont know if this can help me to know which files are the right ones, i just want to get the 240 fastq files of my dataset. 3) if mapping can help, do you recommend any specific program to do it from command line ?
I apologize if this questions are kinda dumb, but im still learning about bioinformatics and i dont have too much knowledge in this area. Any help will be appreciated. Thanks !!.
Can you provide the command line used for this run? Generally Illumina data will have index sequences that differ by one or more nucleotides (or may have N) than the expected set of indexes. Those could generate more than expected number of files.
Hi genomax. The code used for de index.txt was
and the code used to run demuxbyname was the following:
So that explains the greater number of files you are observing. I would just leave the index combinations you expect (and thus know are real) and remove the rest (or simply ignore the other files that have those indexes).
But then i get this:
Input is being processed as paired Time: 3.809 seconds. Reads Processed: 6470116 1698.83k reads/sec Bases Processed: 1623999116 426.41m bases/sec Reads Out: 0 Bases Out: 0
Is the problem that i mentioned, i have an excel spredsheet where i have 120 ID's and 2 barcodes per ID. When i run demuxbyname with that 120 ID barcodes i get that error, but when i get the barcodes with the code i posted (from the output after the sequencing) , i get a lot of files. It is as if at the moment of sequencing in miseq, those barcodes that I have in the excel file are transformed into other barcodes. I don't really understand what happens with that dataset. Perhaps is something intrinsec in the illumina sequencig process but i dont get it.
But then i get this:
Is the problem that i mentioned, i have an excel spredsheet where i have 120 ID's and 2 barcodes per ID. When i run demuxbyname with that 120 ID barcodes i get that error, but when i get the barcodes with the code i posted (from the output after the sequencing) , i get a lot of files. It is as if at the moment of sequencing in miseq, those barcodes that I have in the excel file are transformed into other barcodes. I don't really understand what happens with that dataset. Perhaps is something intrinsec in the illumina sequencig process but i dont get it.