I downloaded some SRA files from a paper that used single cell sequencing with unique barcodes for each cell. I took a closer look at some of the sequences that didn't map to a reference genome (the sample was a community standard so species were known) and hoped someone could help me interpret what I was looking at.
Example Sequence 1 : (Sorry if this isn't intuitive, I wanted to label known segments; 5' at top, read left to right and top to bottom, length=151)
Assuming this is genomic DNA 5' - GATTATGTCGCACTGTACCCGGAAAAATTAGCGGATATTAAG-
Nextera Adapter (unsure of the T) -T-CTGTCTCTTATACACATCTCCGA-
custom index sequence -GCCCACGAGACGTGTCGGGGCTGGCTTA-
flow-cell adapter - CTATGCGGCATCAGAGCAGATTGTACTCGCTATTACGCCAGC - 3'
My understanding is that if the genomic fragment is small enough, sequencing may continue into the adapter, so no problems here. However, I don't know how to explain the next example sequence.
Example Sequence 2: (unknown sequence in parentheses at the end of the flow cell adapter)
partial Nextera Adapter (first nt should be a C) 5' - ATTATACACATCTCCGA-
custom index sequence - GCCCACGAGAGTGTCGGGCTGGCTTA-
barcode - TAGGGTCGCGGCCAG-
flow-cell adapter CTATGCGGCATCAGAGCAGATTGTACTCGCTATTACGCCAGCTGATCTCGTATGCCGTCTTCTGCTTG(ACCAAACATACTCTTTTCCTCTTCC) -3'
For the flow-cell adapter, the nts leading up to the portion in parentheses are complementary to the P7 adapter sequence for Illumina. If sequencing was carried out till the end, wouldn't the end be the last nts of the adapter, or are the nts in parentheses coming from a flow cell oligo? Also, how would sequencing begin upstream of the genomic DNA? Or am I completely misunderstanding something? Thank you so any help!