Question: Read counts of STAR with gff file
2
gravatar for grant.hovhannisyan
3.2 years ago by
grant.hovhannisyan2.0k wrote:

Hi Biostars,

I align RNAseq reads using STAR and my annotation are in gff format.

My STAR command is

STAR --runThreadN 6 --genomeDir dir --sjdbGTFfile gff --sjdbGTFtagExonParentTranscript Parent --readFilesIn reads_1 read2_2 --readFilesCommand zcat --outSAMtype BAM Unsorted --outReadsUnmapped Fastx --quantMode TranscriptomeSAM GeneCounts --outTmpDir ~/TMP/TMPs

Gff file looks like

NC_018292.1 RefSeq  gene    610 1857    .   -   .   ID=gene0;Dbxref=GeneID:14536854;Name=CORT_0A00100;end_range=1857,.;gbkey=Gene;gene_biotype=protein_coding;locus_tag=CORT_0A00100;partial=true;start_range=.,610
NC_018292.1 RefSeq  mRNA    610 1857    .   -   .   ID=rna0;Parent=gene0;Dbxref=Genbank:XM_003865799.1,GeneID:14536854;Name=XM_003865799.1;end_range=1857,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,610;transcript_id=XM_003865799.1
NC_018292.1 RefSeq  exon    610 1857    .   -   .   ID=id47;Parent=rna0;Dbxref=Genbank:XM_003865799.1,GeneID:14536854;end_range=1857,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,610;transcript_id=XM_003865799.1
NC_018292.1 RefSeq  CDS 610 1857    .   -   0   ID=cds0;Parent=rna0;Dbxref=GOA:H8WZ57,InterPro:IPR002938,InterPro:IPR003042,UniProtKB/TrEMBL:H8WZ57,Genbank:XP_003865847.1,GeneID:14536854;Name=XP_003865847.1;gbkey=CDS;product=hypothetical protein;protein_id=XP_003865847.1;transl_table=12
NC_018292.1 RefSeq  gene    2012    4213    .   -   .   ID=gene1;Dbxref=GeneID:14536804;Name=CORT_0A00110;end_range=4213,.;gbkey=Gene;gene_biotype=protein_coding;locus_tag=CORT_0A00110;partial=true;start_range=.,2012
NC_018292.1 RefSeq  mRNA    2012    4213    .   -   .   ID=rna1;Parent=gene1;Dbxref=Genbank:XM_003865800.1,GeneID:14536804;Name=XM_003865800.1;end_range=4213,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,2012;transcript_id=XM_003865800.1
NC_018292.1 RefSeq  exon    2012    4213    .   -   .   ID=id48;Parent=rna1;Dbxref=Genbank:XM_003865800.1,GeneID:14536804;end_range=4213,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,2012;transcript_id=XM_003865800.1
NC_018292.1 RefSeq  CDS 2012    4213    .   -   0   ID=cds1;Parent=rna1;Dbxref=InterPro:IPR001611,InterPro:IPR025875,UniProtKB/TrEMBL:H8WZ58,Genbank:XP_003865848.1,GeneID:14536804;Name=XP_003865848.1;gbkey=CDS;product=hypothetical protein;protein_id=XP_003865848.1;transl_table=12

Mapping rate is high, ca 92% of unique alignments.

But ReadsPerGene.out.tab looks like

N_unmapped  1239857 1239857 1239857
N_multimapping  24667   24667   24667
N_noFeature 342031  16030329    520585
N_ambiguous 0   0   0

When I convert gff file to gtf using gffread, everything works fine. So I am wondering what might be the problem, because this option --sjdbGTFtagExonParentTranscript Parent is supposed to make STAR work with gff file normally.

Cheers,

star read-counts gff • 6.5k views
ADD COMMENTlink modified 2.7 years ago • written 3.2 years ago by grant.hovhannisyan2.0k

How is your ReadsPerGene.out.tab using the gtf? I may be missing something, but I didn't see anything wrong with gff alignment.

ADD REPLYlink written 3.2 years ago by h.mon31k

So this is with gff

N_unmapped      1239857 1239857 1239857
N_multimapping  24667   24667   24667
N_noFeature     342031  16030329        520585
N_ambiguous     0       0       0

This is with gtf

N_unmapped      1239857 1239857 1239857
N_multimapping  24667   24667   24667
N_noFeature     306146  16030300        484729
N_ambiguous     70354   13862   25782
gene0   17      0       17
gene1   36      0       36
gene2   462     2       460 
gene3   877     2       875
gene4   221     1       220
ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by grant.hovhannisyan2.0k

Ok, now I understand. I never used STAR with a gff annotation, I don't know if this is a bug or not, maybe it is worth opening a ticket on STAR github?

Anyway, you have a simple fix to the issue already.

ADD REPLYlink written 3.2 years ago by h.mon31k

Can you check by removing the option --sjdbGTFfile gff ?

This option defines the path of GTF file, and since there is no gtf-file in the current dir named "gff", STAR might be not be doing anything.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Santosh Anand5.1k

If I don't provide the gff file STAR will not count anything, obviously because it does not have info about features. Also this is just a sample command, in real life I specify the path to gff.

ADD REPLYlink written 2.7 years ago by grant.hovhannisyan2.0k
1
gravatar for grant.hovhannisyan
2.7 years ago by
grant.hovhannisyan2.0k wrote:

I have addressed this question to the developer of STAR, here is his answer if somebody is still interested https://groups.google.com/forum/#!searchin/rna-star/grant%7Csort:date/rna-star/yl6JRltAuG4/bbuKHQM4AgAJ

ADD COMMENTlink written 2.7 years ago by grant.hovhannisyan2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1355 users visited in the last hour