I am struggling to pre-process my 16S rRNA gene amplicon Illumina Sequencing data using QIIME. I have several issues that I can't find clear answers for on QIIME's website.
I have 4 files from the sequencing - read 1, read 2, index 1 and index 2 (MiSeq Paired End - 2x 250 cycle). V1-V2 region, Schloss Primer design- 27F and 338R
1) Extract Barcodes extract_barcodes.py): with the option to re-orientate reads (I am finding the reverse complement of my i7 adaptor / linker/ pad/ and barcodes at the beginning of some of my reads in the read 1 file and vice versa in the read 2 file (but reverse complement i5 adapter and forward primer instead)).
2) Join Paired Ends join_paired_ends.py): with option to update the index / barcode reads file to match the surviving joined pairs.
3) Split libraries split_libraries.py): To de-mulitplex and QC with option z- to remove the reverse primer (and adapter / linker / pad/ sequence).
1) I am struggling how to see how the extract barcodes script helps me and to best use it in my case- On QIIME's website it says: for two index/barcode reads and two fastq reads... This situation can be treated as a special case of paired-end reads. One could supply the index files (labeled as index1.fastq, index2.fastq) and use the --input_type barcode_paired_end:
i.e.: extract_barcodes.py --input_type barcode_paired_end -f index1.fastq -r index2.fastq --bc1_len 8 --bc2_len 8 -o parsed_barcodes/
The output barcodes.fastq file would be used for downstream processing, and the reads1 and reads2 files could be ignored.... (This sentence is part of what I don't understand... I need the read 1 and 2 to join the reads and then de-multiplex samples in the other downstream scripts right?)
2) Setting up mapping file for the split_libraries.py script with dual barcodes- THIS is my biggest issue. How do I list both barcodes when the formatting and script allows for only one barcode column? How do others handle duel barcodes with this script and mapping file format? I have been reading other pages but I can't find an answer or example on how to handle this and setup a mapping file to properly de-multiplex my dual indexed samples. Also, I have seen the other scripts use the mapping file - like to re-orientate reads. So getting this right is critical but I am very stuck here on how to set this up in my case.
3) Any other examples or resources to handle paired end Illumina miseq data in QIIME for first time users - specifically for those with 4 original sequencing files - 2 read files and 2 index read files.
Thank you in advance!