Question

Set the insert size threshold in STAR

0

Entering edit mode

5.6 years ago

alvarocentron91 ▴ 10

I have mapped some reads using STAR:

STAR --runThreadN 18 --genomeDir $HOME/Doct2.0/Genomes/Ustilago/STAR_index/ --readFilesIn $HOME/Doct2.0/Data/ax_3/ax3_1_paired.fastq $HOME/Doct2.0/Data/ax_3/ax3_2_paired.fastq --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 40

However I have a problem visualizing the data in IGV, I'm getting several reads with huge insert sizes. I tried to solve it by setting in -4 the --scoreInsOpen --scoreInsBase parameters, I thought in this way I will penalize long inserts but I got the same results.

I'm on my first steps in RNA-Seq analysis and I don't know how to proceed (I know how to eliminate those reads after the mapping, but I have a lot more data to analyze and I think it would be better if I could solve the problem while mapping), so if you know how to set an insert size threshold in STAR would be great.

Thank you!

STAR RNA-Seq • 2.1k views

ADD COMMENT • link updated 5.5 years ago by Biostar 20 • written 5.6 years ago by alvarocentron91 ▴ 10

1

Entering edit mode

To clarify you don't want STAR to map reads that exceed a certain threshold insert size? Those alignments may represent real splice events.

ADD REPLY • link 5.6 years ago by GenoMax 141k

0

Entering edit mode

But is it normal have so many of those? If it is I'll certainly keep them, even if that is not the aim of my work (just a master's work)

ADD REPLY • link 5.6 years ago by alvarocentron91 ▴ 10

1

Entering edit mode

Have you right clicked on the display in the middle of display of alignments and chosen "show reads as pairs"? There are many option there to explore as to how IGV displays those alignments.

ADD REPLY • link 5.6 years ago by GenoMax 141k

0

Entering edit mode

I did, anyways I still have the same doubt, I feel like there are many spliced reads. However, I saw that genes have good coverage along all the chromosomes and that is what I was looking for, so I guess it is enough if combined with the HISAT2 results. For now, I will go forward with the pipeline and if I have time I will come back to study those splices.

Thank you!

ADD REPLY • link 5.6 years ago by alvarocentron91 ▴ 10

1

Entering edit mode

so I guess it is enough if combined with the HISAT2 results.

Why would you want to combine results from two aligners (for the same sample)?

Looks like you are working with a fungal genome so splicing should be expected. Image posted above is for a much longer region so it is hard to tell what the reads are doing. IGV can also show you splice junctions graphically.

ADD REPLY • link 5.6 years ago by GenoMax 141k

0

Entering edit mode

I meant to compare, and yes I have been talking with some co-workers and they said the same

ADD REPLY • link 5.6 years ago by alvarocentron91 ▴ 10