I've been mapping some bisulfite-seq data from SRA (SRR577617) and noticed that in the raw reads there are some sequences which have the adaptor at or near the 5' end, implying that no genomic DNA was sequenced (unless I'm mistaken, both reported adaptors should be trimmed on the 3' side, leaving an empty sequence after trimming). In that case, what makes up the several dozen base calls that follow the 3' end of the reported adaptor? Is this just noise caused by the sequencer (HiSeq) trying to call bases from a flowcell spot where no more synthesis is occurring? I would assume so but the sequences tend to be highly enriched for Ts and depleted in Cs, implying bisulfite converted DNA.
After looking more closely I realize that my assumption about the adaptors was wrong. The second adaptor listed in the SRA entry (which is similar to the RC of the other adaptor listed) was present on the 5' end, meaning the sequences I was seeing 3' of that adaptor were indeed biological.