STAR index file for GRCH37
1
0
Entering edit mode
12 months ago

Hello everyone

I am trying to generate the index file for STAR alignment using hg19 genome. I used the following commad

STAR  --runThreadN 30    --runMode genomeGenerate  --genomeDir /data/shilpia2/STAR.index/ --genomeFastaFiles /data/shilpia2/STAR.index/GRCh37.primary_assembly.genome.fa --sjdbGTFfile /data/shilpia2/gff/gencode.v24.basic.annotation.gtf  --sjdbOverhang 100 --limitGenomeGenerateRAM 30000000000  --outFileNamePrefix /data/shilpia2/STAR.index/hg19


However, the program stops after a while without giving any error and without generating the index file. Could anyone suggest me what could be the reason or is there any problem in my command.

Thanks

software error STAR • 736 views
0
Entering edit mode

I would drop the --limitGenomeGenerateRAM and --outFileNamePrefix flags You could reduce --runThreadN to say, 8, (it might be a resource issue with your cluster). Also make sure that the --genomeDir exists. Let me know how you get on

0
Entering edit mode

How much memory do you have? You need at least 30G+ RAM for the index generation.

0
Entering edit mode

Thank you so much for your response. I used 30GM RAM to run my program and run it for 3 days but it still did not generate the file. Do you think i should run for longer time.

0
Entering edit mode

Did you have 30 cores available for the job? Did you get anything in log/error log?

Alex has pre-made hg19/GRCh37 indexes available at this link, if you can't make them.

0
Entering edit mode

I do have 30 cores available. The log file generated does not show any error. The running of STAR terminates after reading of the gtf file. I tried to use the index file from the link you provided. But it shows some error in the genome file.

0
Entering edit mode

This is what it appears in the log file.

 ..... processing annotations GTF
!!!!! WARNING: while processing sjdbGTFfile=/data/shilpia2/gff/gencode.v24.basic.annotation.gtf, line:
chr3    HAVANA  exon    198024658   198024788   .   +   .   gene_id "ENSG00000185621.11"; transcript_id "ENST00000482695.5"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "LMLN"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "LMLN-002"; exon_number 15; exon_id "ENSE00003689636.1"; level 2; protein_id "ENSP00000418324.1"; tag "basic"; transcript_support_level "1"; tag "appris_alternative_2"; havana_gene "OTTHUMG00000155375.2"; havana_transcript "OTTHUMT00000339702.1";
exon end = 198024788 is larger than the chromosome chr3 length = 198022430 , will skip this exon

1
Entering edit mode

https://www.gencodegenes.org/human/release_24.html has a file named 'gencode.v24.basic.annotation.gtf'

There are all for hg38 not hg37/hg19.

The hg37/hg19 versions are here: https://www.gencodegenes.org/human/release_24lift37.html

0
Entering edit mode

From the link you provided should i download Comprehensive gene annotation file for GTF and Genome sequence, primary assembly (GRCh37) files ?

0
Entering edit mode

It's up to you and depends on the goals of your study. I primarily use annotations from ENSEMBL and am thus not familiar with the basic vs comprehensive gene annotations. I think you should probably be fine undertaking standard differential gene expression analysis with the basic set but some features could be missing.

0
Entering edit mode

I just have another question. Which is better for alignment. I know people have been recommending to use STAR, but what if I use Bowtie. I was just trying to compare both the tools and see how much is the difference. I was looking for your suggestion. I have to do simple differential gene analysis. So it is ok if I use Bowtie?

0
Entering edit mode

No, Bowtie is used for genomic alignments (i.e. DNA), for transcriptomic alignments (RNA) most would recommend a splice-aware aligner like STAR but you could also use TopHat2 (which uses bowtie under the hood).

0
Entering edit mode

Ok. Thank you so much for your response.

0
Entering edit mode
12 months ago
GenoMax 107k

Are you mixing/matching sequences/annotations by any chance? They are all for the same build?

0
Entering edit mode

Hi

I did mix the annotation which caused the problem. I got the index file generated using the right GTF file. Thank you so much for your response.