One of the arguments that STAR --genomeGenerate takes in is sjdbOverhang which the manual says
"specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junctions database" and that it should be equal to read length - 1.
Could someone please explain what exactly they mean by annotated junction. And also, why does the read length matter at the step of building the SA index? I can imagine using this as an argument during the actual alignment step, but I don't understand how the read could impact the genome index to be created.