I am a new graduate student in biology and am relatively new to sequencing in general. I am planning on doing a genome-wide CRISPR screen over many days. I plan to extract genomic DNA from each condition and amplify the sg region with primers that include the illumina adaptors and a barcode on the reverse primer. I will pool all of these barcoded samples together and run them via NovaSeq paired end 100bp sequencing. The barcode is most certainly within the first 100 bp of the reverse read and with paired end I should be able to tell which forward read it corresponds to.
I recently sent the library off for sequencing to determine representation of each sgRNA in the library using the exact same sequence parameters. Unfortunately, most of the primer (except the part that annealed to the backbone of the plasmid) including barcode was not present in the reverse sequence. However, I told the core that sequenced my samples which index I used and that index was in the information line of each record in the fastq file. The core informed me they did no preprocessing of the reads.
So my question— is there any way for the illumina sequencing machine to know which index is present if it isn’t present in the read? Also, my PCR product size corresponds with the whole p7 primer being present in the product, so why isn’t some of the reverse primer present in the sequence reads? Do I need to increase the size of reads when I sequence my screen in order to demultiplex?
Thanks in advance for any help!