sorry in advance for my beginner level in bioinformatic. I need to do analysis of illumina reads paired end. The starting files i have are: fwd.fastq , reverse.fastq and mapping file.txt.
In the sequences files, there are several samples with different barcodes (barcodes with different lengths) and different primers (3 different set of primers). I tried so far to:
1) I extracted barcodes with extract_barcodes.py --bc1_len X --bc2_len X (where is x is the length of the barcode). I have barcodes of length 4,5,6,7 and 8 so I did 5 times this command with different lengths
2) I merged the reads1.fastq and reads2.fastq output files from 1) for each barcodes length.I had 1,78% in general of non-merging sequences.I also asked to get a fastjoin.join_barcodes.fastq file.
3) I tried to do split_libraries_fastq.py with the merged sequences and the fastjoin.join_barcodes.fastq file. But then I get over 1 900 000 reads non associated to barcodes out of 2 500 000 total reads.
If someone could help me I would be really glad!