STAR parameters in OmicsBox
1
0
Entering edit mode
3.7 years ago
phamh ▴ 30

Hi,

I want do run STAR in OmicsBox and I have some questions about its parameters. Typically, there's not a lot of references for parameters and not every one of them should be set as default, so I'm really struggling with them. I'd really appreciate it if someone could help me.


1) 'Maximum distance between mates'

This info is from https://wikibits.ugent.be/index.php/Parameters_of_STAR

  • = maximum distance between reads from a pair when mapped to the genome. If reads map to the genome farther apart the fragment is considered to be chimeric. The default value of 500000 is fine-tuned to mammalian genomes, for plant and yeast genomes you will have to decrease it.
  • STAR maps the reads to the genome, this is why the max distance between reads of a pair is equal to the intron size. For organisms with small introns you should take intron size + max fragment length

Is this info correct? My research involves Physcomitrella patens, so I'll have to decrease the input value for this parameter, but I don't know by how much. Where does that default value (500000) come from? Can I use the suggestion mentioned above (intron size + max fragment length)?


2) ‘Include Chimeric Alignments’ checkbox

This info is from OmicsBox Manual http://manual.omicsbox.biobam.com/user-manual/module-transcriptomics/rna-seq-alignment/#RNA-SeqAlignment-RunRNA-SeqAlignment(STAR)

  • This option allows to include the chimeric alignments together with normal alignments in the main BAM file. The format of chimeric alignments follows the latest SAM/BAM specifications.

Is there a reason why one should or should not separate these two kinds of alignment?


3) 'Maximum Number of Mismatches'

This info is from https://wikibits.ugent.be/index.php/Parameters_of_STAR

  • = maximum number of mismatches for a read (single-end) or a pair of reads (paired-end). Default is 10. The value you should choose is dependent on the read length. For short quality trimmed reads you typically allow 5% mismatches.

The default value in STAR in OmicsBox is 999, which is confusing to me. My reads are 150bp, which is not short, right? I'm not sure what to do with this parameter. Should I leave it as default (10)?


Thank you.

STAR OmicsBox Parameters • 1.8k views
ADD COMMENT
0
Entering edit mode

You should try a run leaving all parameters at default. Only thing I would change is the "max distance between mates". If you know what the average length of introns is in your organism then you can use that number instead of 500K which is appropriate for human/mammalian genomes.

ADD REPLY
0
Entering edit mode
3.7 years ago

I think optimizing those three parameters is going to have an incredibly small effect on your mapping results. Except that showing 999 mismatched positions might bloat your bam a lot; if something maps that many times in the genome, there's a good chance you aren't going to do much with it anyway.

ADD COMMENT
0
Entering edit mode

Can you please tell me why you think it would have minimal effect on my mapping results? In one source I found, they said STAR was optimized for mammalian genomes and suggested changing parameters if using plant genomes.

ADD REPLY
0
Entering edit mode

It can be true both that it's optimized for mammals, and that the current settings are only slightly non-optimal for other kinds of eukaryotes. Are you expectng a huge number of chimeric reads? How many wrong alignments do you think you will get by having a too-generous max pair distance allowance?

ADD REPLY
0
Entering edit mode

I honestly don't know how to answer those questions, so does my research faculty. Is there a way to figure those out?

ADD REPLY

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6