Earlier I had a problem ( already solved, thanks to the help of Brian Bushnell and Genomax), in which my index reads were not supply in a separated file but in the fastq labels, like this example:
@GHAY-HISEQ2:5:2308:2003:1934#TTGCTGGA-ACCAACTG/1;1 NGCATGAACGGCTAAACGAGGGTCCAACTGTCTCTTATCT +GHAY-HISEQ2:5:2308:2003:1934#TTGCTGGA-ACCAACTG/1;1 B[[aaeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee @GHAY-HISEQ2:5:2308:2551:1934#CCTGGATA-TGCTCGAC/1;1 NAGCTGGAATTACCGCGGCTGCTGGCACCAGACTTGCCCT +GHAY-HISEQ2:5:2308:2551:1934#CCTGGATA-TGCTCGAC/1;1 B[[[aeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
With the aid of demuxbyname script from BBSuiteTools, I was able to demultiplex all reads with indexes containing no mismatch.
I then got nearly 90 % of the reads using this approach, but I am thinking in how I could extract from the remaining 10%, reads with indexes containing up to 1 mismatch.
Do anyone know some method for doing this?