Question: STAR index file for GRCH37
0
gravatar for bioinformatics.queries
5 months ago by
bioinformatics.queries50 wrote:

Hello everyone

I am trying to generate the index file for STAR alignment using hg19 genome. I used the following commad

STAR  --runThreadN 30    --runMode genomeGenerate  --genomeDir /data/shilpia2/STAR.index/ --genomeFastaFiles /data/shilpia2/STAR.index/GRCh37.primary_assembly.genome.fa --sjdbGTFfile /data/shilpia2/gff/gencode.v24.basic.annotation.gtf  --sjdbOverhang 100 --limitGenomeGenerateRAM 30000000000  --outFileNamePrefix /data/shilpia2/STAR.index/hg19

However, the program stops after a while without giving any error and without generating the index file. Could anyone suggest me what could be the reason or is there any problem in my command.

Thanks

star software error • 341 views
ADD COMMENTlink modified 4 months ago • written 5 months ago by bioinformatics.queries50

I would drop the --limitGenomeGenerateRAM and --outFileNamePrefix flags You could reduce --runThreadN to say, 8, (it might be a resource issue with your cluster). Also make sure that the --genomeDir exists. Let me know how you get on

ADD REPLYlink written 5 months ago by Barry Digby640

How much memory do you have? You need at least 30G+ RAM for the index generation.

ADD REPLYlink written 5 months ago by GenoMax96k

Thank you so much for your response. I used 30GM RAM to run my program and run it for 3 days but it still did not generate the file. Do you think i should run for longer time.

ADD REPLYlink written 5 months ago by bioinformatics.queries50

Did you have 30 cores available for the job? Did you get anything in log/error log?

Alex has pre-made hg19/GRCh37 indexes available at this link, if you can't make them.

ADD REPLYlink written 5 months ago by GenoMax96k

I do have 30 cores available. The log file generated does not show any error. The running of STAR terminates after reading of the gtf file. I tried to use the index file from the link you provided. But it shows some error in the genome file.

ADD REPLYlink written 5 months ago by bioinformatics.queries50

This is what it appears in the log file.

 ..... processing annotations GTF
!!!!! WARNING: while processing sjdbGTFfile=/data/shilpia2/gff/gencode.v24.basic.annotation.gtf, line:
chr3    HAVANA  exon    198024658   198024788   .   +   .   gene_id "ENSG00000185621.11"; transcript_id "ENST00000482695.5"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "LMLN"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "LMLN-002"; exon_number 15; exon_id "ENSE00003689636.1"; level 2; protein_id "ENSP00000418324.1"; tag "basic"; transcript_support_level "1"; tag "appris_alternative_2"; havana_gene "OTTHUMG00000155375.2"; havana_transcript "OTTHUMT00000339702.1";
 exon end = 198024788 is larger than the chromosome chr3 length = 198022430 , will skip this exon
ADD REPLYlink modified 5 months ago by GenoMax96k • written 5 months ago by bioinformatics.queries50
1

https://www.gencodegenes.org/human/release_24.html has a file named 'gencode.v24.basic.annotation.gtf'

There are all for hg38 not hg37/hg19.

The hg37/hg19 versions are here: https://www.gencodegenes.org/human/release_24lift37.html

ADD REPLYlink written 5 months ago by benformatics2.0k

From the link you provided should i download Comprehensive gene annotation file for GTF and Genome sequence, primary assembly (GRCh37) files ?

ADD REPLYlink written 5 months ago by bioinformatics.queries50

It's up to you and depends on the goals of your study. I primarily use annotations from ENSEMBL and am thus not familiar with the basic vs comprehensive gene annotations. I think you should probably be fine undertaking standard differential gene expression analysis with the basic set but some features could be missing.

ADD REPLYlink written 5 months ago by benformatics2.0k

I just have another question. Which is better for alignment. I know people have been recommending to use STAR, but what if I use Bowtie. I was just trying to compare both the tools and see how much is the difference. I was looking for your suggestion. I have to do simple differential gene analysis. So it is ok if I use Bowtie?

ADD REPLYlink written 4 months ago by bioinformatics.queries50

No, Bowtie is used for genomic alignments (i.e. DNA), for transcriptomic alignments (RNA) most would recommend a splice-aware aligner like STAR but you could also use TopHat2 (which uses bowtie under the hood).

ADD REPLYlink modified 4 months ago • written 4 months ago by benformatics2.0k

Ok. Thank you so much for your response.

ADD REPLYlink written 4 months ago by bioinformatics.queries50
0
gravatar for GenoMax
5 months ago by
GenoMax96k
United States
GenoMax96k wrote:

Are you mixing/matching sequences/annotations by any chance? They are all for the same build?

ADD COMMENTlink written 5 months ago by GenoMax96k

Hi

I did mix the annotation which caused the problem. I got the index file generated using the right GTF file. Thank you so much for your response.

ADD REPLYlink written 5 months ago by bioinformatics.queries50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 948 users visited in the last hour
_