Question

Sjdboverhang Option In Star

11

Entering edit mode

10.4 years ago

Martombo ★ 3.1k

I have some difficulties in understanding the option sjdbOverhang in STAR. This option is set when making use of a splice junctions database. The manual defines it to be: "the length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1)". It seems to be a very important option, because if it is set to 0 (default), the splice junctions database is not used.

I don't think it's the minimal alignment length for a read spanning the junction, because there's already the option alignSJDBoverhangMin for that, which is defined as "the minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments".

is it then the expected length maybe?

• 34k views

ADD COMMENT • link updated 12 months ago by Kermit ▴ 90 • written 10.4 years ago by Martombo ★ 3.1k

score 15 · Answer 1 · 2014-02-26

15

Entering edit mode

10.4 years ago

Martombo ★ 3.1k

here's the answer from Alexander Dobin, the developer of the algorithm (to whom I'm very grateful):

the "Overhang" in these parameters has different meanings - bad naming choice, unfortunately.

The --sjdbOverhang is used only at the genome generation step, and tells STAR how many bases to concatenate from donor and acceptor sides of the junctions. If you have 100b reads, the ideal value of --sjdbOverhang is 99, which allows the 100b read to map 99b on one side, 1b on the other side. One can think of --sjdbOverhang as the maximum possible overhang for your reads.

On the other hand, --alignSJDBoverhangMin is used at the mapping step to define the minimum allowed overhang over splice junctions. For example, the default value of 3 would prohibit overhangs of 1b or 2b.

ADD COMMENT • link 10.4 years ago by Martombo ★ 3.1k

2

Entering edit mode

This also means that for every different read-length to be aligned a new genome SA needs to be generated. Otherwise a drop in aligned reads can be experienced.

ADD REPLY • link 8.9 years ago by Michael 54k

0

Entering edit mode

If we have data from 2 batches with different read lengths, does that means we suppose to map the data separately with different indexes?

ADD REPLY • link 2.1 years ago by Thind amarinder ▴ 340

2

Entering edit mode

In my understanding, that would be optimal, but since v2.4 STAR has an option to set --sjdbOverhang and other sjdb options on the fly during alignment: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

ADD REPLY • link 2.1 years ago by Michael 54k

0

Entering edit mode

This is what I found. https://biocorecrg.github.io/RNAseq_course_2019/alnpractical.html It usually equals to the minimum read size minus 1; This also means that for every different read-length to be aligned a new STAR index needs to be generated. Otherwise a drop in aligned reads can be experienced.

ADD REPLY • link 2.1 years ago by Thind amarinder ▴ 340

0

Entering edit mode

Thanks for the follow up.

ADD REPLY • link 10.4 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Here's how to find read length: How do I find out the read lenght of a fastq file?

ADD REPLY • link 12 months ago by Kermit ▴ 90

score 5 · Answer 2 · 2014-02-24

5

Entering edit mode

10.4 years ago

Ashutosh Pandey 12k

This is quiet close to what you have asked:

https://groups.google.com/forum/#!topic/rna-star/J6qH9JCysZw

Basically, sjdbOverhang should be set as readlength -1. So if you have 75 bp read then it should be set to 74. Whereas, alignSJDBoverhangMin option ignores the alignment with a small spilce overhangs. I use the default settings for this parameter.

ADD COMMENT • link 10.4 years ago by Ashutosh Pandey 12k

0

Entering edit mode

thank you for the reply. I read that this option should be set to mate_length - 1, which should be equivalent to read_length -1, as you wrote. but do you know what it is used for, anyway?

ADD REPLY • link 10.4 years ago by Martombo ★ 3.1k

1

Entering edit mode

We have two dataset. One dataset is generated in our lab and has 58 read length and other dataset obtained from a paper which contains read length of 75bp. In such case what should be the value of sjdboverhang

ADD REPLY • link 3.8 years ago by rohitsatyam102 ▴ 900

1

Entering edit mode

This is what I found. https://biocorecrg.github.io/RNAseq_course_2019/alnpractical.html It usually equals to the minimum read size minus 1; This also means that for every different read-length to be aligned a new STAR index needs to be generated. Otherwise a drop in aligned reads can be experienced.

ADD REPLY • link 2.1 years ago by Thind amarinder ▴ 340