Pre made STAR Index?
2
1
Entering edit mode
5.1 years ago
atcggcta ▴ 50

Hello!

I'm sorry if this question comes of as naive or ignorant because I'm very new to Bioinformatics. I'm trying to do an alignment with STAR and was wondering if I could access a pre-made STAR index for the mm10 genome. I was told I could do this from UCSC but have had no luck finding it there.

So my question is, Are there pre-made STAR index files for the mm10 genome that I could download? And if so where and how?

Thanks in advance for any help and I'm sorry to ask such a trivial question! Let me know if there's anymore detail I can give!

STAR • 18k views
10
Entering edit mode
5.1 years ago
dvanic ▴ 240

I'd suggest generating your own index using the mm10 genome as per the instructions below, and using the latest gencode mouse genes. To keep things consistent (major problem in bioinformatics!!!) I'd download BOTH the genome and the annotation gtf from here http://www.gencodegenes.org/mouse_releases/current.html

You want the Comprehensive gene annotation - PRI gtf and the Genome sequence, primary assembly (GRCm38) - PRI fasta sequence (this is your genome).

0
Entering edit mode

Thank you so much for the response, this is very helpful!

1
Entering edit mode

I'd very strongly suggest you build your own index! To do this:

ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M11/gencode.vM11.primary_assembly.annotation.gtf.gz
ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M11/GRCm38.primary_assembly.genome.fa.gz


then run

gunzip gencode.vM11.primary_assembly.annotation.gtf.gz
gunzip GRCm38.primary_assembly.genome.fa.gz


wherever you saved those files then:

STAR --runThreadN 4 --runMode genomeGenerate --genomeDir WhereYouWantIndex --genomeFastaFiles GRCm38.primary_assembly.genome.fa --sjdbGTFfile gencode.vM11.primary_assembly.annotation.gtf --sjdbOverhang 100


This will use 4 cores to generate a genome and splice junction (which you want!!!) annotation for your genome. The 100 allows your reads to overhang each splice junction by maximum 100 bp. If your reads are longer (150 ?) then make that the value of this parameter. Then map against this.

NB If you plan to do differential expression, use featureCounts or HTSeq to counts to that gencode GTF.

0
Entering edit mode

OK thank you very much! I will definitely try this out.

One more question. The CPU of the server I'm using has 40 cores, does this change how many I should use to build the index?

Thanks again!

1
Entering edit mode

It will just be faster with more cores but not influence the behavior of the index files., essentially, the files will be equal regardless of the number of cores used.

0
Entering edit mode

OK Awesome, Thanks again

0
Entering edit mode

Just for clarity the primary assembly GRCm38 is not the same genome as mm10 from UCSC correct? So per the encode data standards you would download mm10 as the genome (which is based on GRCm38) and then use the gencode comprehensive gtf for annotation?

3
Entering edit mode
5.1 years ago
GenoMax 110k

You can generate indexes yourself easily enough. Follow the directions here: generating genome indexes with STAR . MM10 genome from UCSC is here.

@Alex has some pre-made indexes available at STAR Genomes site. There does not appear to be a UCSC version of Mouse but there is Gencode Mouse which you can use.

0
Entering edit mode

I have another newbie question though, when I follow your Gencode Mouse link I find a bunch of links available. Would you be able to tell me which one I should use as the index when I'm doing the alignment?

Thanks again and sorry if this is a silly question!

1
Entering edit mode

STAR index consists of these files.

chrLength.txt  chrName.txt  chrStart.txt  Genome  genomeParameters.txt  SA  SAindex


I suggest that you get the entire set of files in that folder.

0
Entering edit mode

OK awesome thank you very much!