Only 35 out of 800 genes in STAR reference assembly
1
0
Entering edit mode
3 months ago
Jasper • 0

Hello all, I'm at a loss here. When I use STAR to make a Mesomycoplasma ovipneumoniae reference genome for a transcriptomics project, STAR does its thing and I end up with a reference. When I look at the gene info after just the reference assembly, I see 35 genes listed. If I go on to mapping and quantification, I naturally only get information for 35 genes.

What is happening to the other 770 genes? What could I be doing wrong?

What I've run:

STAR --runThreadN 6 \
--runMode genomeGenerate \
--genomeDir Reference \
--genomeSAindexNbases 9 \
--genomeFastaFiles /location/path/GCF_028885435.1_ASM2888543v1_genomic.fa \
--sjdbGTFfile /location/path/GCF_028885435.1/genomic.gtf 

The file source: https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP118522.1/

RNAseq transcriptomics RNA_seq STAR RNA_sequencing • 809 views
ADD COMMENT
0
Entering edit mode

Upon further inspection, I see that the genome has 35 genes encoding RNA elements (tRNA, rRNA, etc), but this still doesn't really explain why the reference only contains those elements, and not any of the protein coding elements.

ADD REPLY
1
Entering edit mode
3 months ago
GenoMax 154k

If the GTF file does not have information about other genes then this is fully expected. STAR is not generating any information about genes, it is only using information provided in the GTF file.

If you look at the RefSeq entry for this organism there are 805 genes: https://www.ncbi.nlm.nih.gov/datasets/gene/GCF_028885435.1/

Download the genome and the GTF from here and recheck: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/028/885/435/GCF_028885435.1_ASM2888543v1/

That said you don't necessarily need to use STAR since there is no need to look for splicing.

ADD COMMENT
0
Entering edit mode

Thank you for replying! The GTF file I have does contain all 805 genes, and it's identical to the one you linked. I chose STAR to accommodate potential adapter contamination rather than to handle splicing. I'm giving it a shot with Bowtie2 now as an alternative.

ADD REPLY
0
Entering edit mode

Okay, I believe I figured my issue. STAR expects lines in the GTF to have "exon" as entries in the third column. This particular GTF does not say "exon" for any gene except tRNA and rRNA genes. By specifying "--sjdbGTFfeatureExon gene", STAR added all genes to the reference build.

ADD REPLY

Login before adding your answer.

Traffic: 3836 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6