Question: STAR genome index with and/or with out *.gtf annotation
1
gravatar for kirannbishwa01
4.4 years ago by
kirannbishwa011.2k
United States
kirannbishwa011.2k wrote:

STAR needs genome file (*.fasta, fa) to create genome indexes. But, is it necessary to supplement the gtf annotation files, even though it works without it.

Details: I have a diploid genome and transcriptome database (made using reference genome, SNP/InDel polymorphism) of two different populations. The diploid genome is a single file but the population level transcriptome database aren't merged.

I think it can be merged but don't know of any consequences it may bring on the alignment. - Any suggestions??

If not the choice is just to create genome index and align the RNAseq data to it.

What difference does it make if you make the genome index with or without the gtf file?

bowtie rna-seq star alignment gtf • 8.1k views
ADD COMMENTlink modified 2.2 years ago by drskm70 • written 4.4 years ago by kirannbishwa011.2k

Hi Guys,

I have a quick doubt on the output of the Genome Indexing, I have used the STAR program along with genome .fasta file and GFF file.

Genome size is 3GB, here is the file output

chrLength.txt
chrNameLength.txt
chrName.txt
chrStart.txt
genomeParameters.txt

I have another small Genome 60MB in size, I did the genome indexing, here is the file output

chrLength.txt
chrNameLength.txt
chrName.txt
chrStart.txt
exonGeTrInfo.tab
exonInfo.tab
geneInfo.tab
Genome
genomeParameters.txt
SA
SAindex
sjdbInfo.txt
sjdbList.fromGTF.out.tab
sjdbList.out.tab
transcriptInfo.tab

My point here is that, why I got the extra information for my small genome size, but I didn't get the same for the big size genome. I do apply the same procedure for the both.

here is the below information. Only difference I made for the large Genome size is (--sjdbOverhang 99 \ --genomeChrBinNbits 15) to reduce the memory, but the rest of things are same for small genome.

#!/bin/bash
NUM_THREADS=12
mkdir DB
STAR --runMode genomeGenerate --genomeDir DB \
    --runThreadN $NUM_THREADS \
    --genomeFastaFiles XL9_2.fa \
    --sjdbGTFfile XENLA_Frog.gtf+gff3 \
    --sjdbOverhang 99 \
    --genomeChrBinNbits 15

Could anyone give an idea, why there is different, I am new to this field, so I am wondering about the difference in this.

Thanks in advance.

Cheer San

ADD REPLYlink modified 13 months ago by genomax91k • written 2.2 years ago by drskm70
3
gravatar for Devon Ryan
4.4 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

You don't need to provide the GTF file beforehand, however it's certainly convenient to do so if you know ahead of time that you'll be using it. The only consequence of doing it afterward is making the alignment take longer. If you omit the GTF file completely then you'll just get lower quality spliced alignments.

ADD COMMENTlink written 4.4 years ago by Devon Ryan97k

I think, I should be able to merge two gtf files and do the alignment then. These are basically population level custom gtf generated by adding SNP/Indels to the reference gtf. The chromosome go by 1_P, 2_P, .......... for paternal strain, and 1_M, 2_M, 3_M for maternal strain.

I am hoping there wouldn't be a problem.

Thanks for the update.

ADD REPLYlink written 4.4 years ago by kirannbishwa011.2k
1

If you aligned to a concatenated genome of the maternal and paternal strains then go ahead and merge the GTF files too.

ADD REPLYlink written 4.4 years ago by Devon Ryan97k

Hi Devon, I used STAR for counting the reads using this function "--quantMode TranscriptomeSAM GenCounts" without GTF file neither for the annotation or read counting, is there anything that I should be concerned of?

Thank you

ADD REPLYlink written 13 months ago by Morris_Chair200

If the original index was made including a GTF then this should work fine (I've never tried it).

ADD REPLYlink written 13 months ago by Devon Ryan97k

Do you know if there is a way to see whether or not the annotation file (GTF) was used to index the genome in STAR?

thanks

ADD REPLYlink written 13 months ago by Morris_Chair200

Problem solved, inside the genome directory there is the file genomeParameters.txt which contain an entry like sjdbGTFfile and gtf used...fewwww...

thanks anyways Devon

ADD REPLYlink written 13 months ago by Morris_Chair200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2024 users visited in the last hour