Question: Sjdboverhang Option In Star
5
gravatar for Martombo
7.0 years ago by
Martombo2.7k
Seville, ES
Martombo2.7k wrote:

I have some difficulties in understanding the option sjdbOverhang in STAR. This option is set when making use of a splice junctions database. The manual defines it to be: "the length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1)". It seems to be a very important option, because if it is set to 0 (default), the splice junctions database is not used.

I don't think it's the minimal alignment length for a read spanning the junction, because there's already the option alignSJDBoverhangMin for that, which is defined as "the minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments".

is it then the expected length maybe?

• 18k views
ADD COMMENTlink modified 5.5 years ago by Biostar ♦♦ 20 • written 7.0 years ago by Martombo2.7k
9
gravatar for Martombo
7.0 years ago by
Martombo2.7k
Seville, ES
Martombo2.7k wrote:

here's the answer from Alexander Dobin, the developer of the algorithm (to whom I'm very grateful):

the "Overhang" in these parameters has different meanings - bad naming choice, unfortunately.

The --sjdbOverhang is used only at the genome generation step, and tells STAR how many bases to concatenate from donor and acceptor sides of the junctions. If you have 100b reads, the ideal value of --sjdbOverhang is 99, which allows the 100b read to map 99b on one side, 1b on the other side. One can think of --sjdbOverhang as the maximum possible overhang for your reads.

On the other hand, --alignSJDBoverhangMin is used at the mapping step to define the minimum allowed overhang over splice junctions. For example, the default value of 3 would prohibit overhangs of 1b or 2b.

ADD COMMENTlink written 7.0 years ago by Martombo2.7k
1

This also means that for every different read-length to be aligned a new genome SA needs to be generated. Otherwise a drop in aligned reads can be experienced.

ADD REPLYlink written 5.5 years ago by Michael Dondrup48k

Thanks for the follow up.

ADD REPLYlink written 7.0 years ago by Ashutosh Pandey12k
4
gravatar for Ashutosh Pandey
7.0 years ago by
Philadelphia
Ashutosh Pandey12k wrote:

This is quiet close to what you have asked:

https://groups.google.com/forum/#!topic/rna-star/J6qH9JCysZw

Basically, sjdbOverhang should be set as readlength -1. So if you have 75 bp read then it should be set to 74. Whereas, alignSJDBoverhangMin option ignores the alignment with a small spilce overhangs. I use the default settings for this parameter.

ADD COMMENTlink written 7.0 years ago by Ashutosh Pandey12k

thank you for the reply. I read that this option should be set to mate_length - 1, which should be equivalent to read_length -1, as you wrote. but do you know what it is used for, anyway?

ADD REPLYlink written 7.0 years ago by Martombo2.7k

We have two dataset. One dataset is generated in our lab and has 58 read length and other dataset obtained from a paper which contains read length of 75bp. In such case what should be the value of sjdboverhang

ADD REPLYlink written 5 months ago by rohitsatyam102270
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1017 users visited in the last hour
_