Read counts of STAR with gff file
1
3
Entering edit mode
6.7 years ago

Hi Biostars,

I align RNAseq reads using STAR and my annotation are in gff format.

My STAR command is

STAR --runThreadN 6 --genomeDir dir --sjdbGTFfile gff --sjdbGTFtagExonParentTranscript Parent --readFilesIn reads_1 read2_2 --readFilesCommand zcat --outSAMtype BAM Unsorted --outReadsUnmapped Fastx --quantMode TranscriptomeSAM GeneCounts --outTmpDir ~/TMP/TMPs

Gff file looks like

NC_018292.1 RefSeq  gene    610 1857    .   -   .   ID=gene0;Dbxref=GeneID:14536854;Name=CORT_0A00100;end_range=1857,.;gbkey=Gene;gene_biotype=protein_coding;locus_tag=CORT_0A00100;partial=true;start_range=.,610
NC_018292.1 RefSeq  mRNA    610 1857    .   -   .   ID=rna0;Parent=gene0;Dbxref=Genbank:XM_003865799.1,GeneID:14536854;Name=XM_003865799.1;end_range=1857,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,610;transcript_id=XM_003865799.1
NC_018292.1 RefSeq  exon    610 1857    .   -   .   ID=id47;Parent=rna0;Dbxref=Genbank:XM_003865799.1,GeneID:14536854;end_range=1857,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,610;transcript_id=XM_003865799.1
NC_018292.1 RefSeq  CDS 610 1857    .   -   0   ID=cds0;Parent=rna0;Dbxref=GOA:H8WZ57,InterPro:IPR002938,InterPro:IPR003042,UniProtKB/TrEMBL:H8WZ57,Genbank:XP_003865847.1,GeneID:14536854;Name=XP_003865847.1;gbkey=CDS;product=hypothetical protein;protein_id=XP_003865847.1;transl_table=12
NC_018292.1 RefSeq  gene    2012    4213    .   -   .   ID=gene1;Dbxref=GeneID:14536804;Name=CORT_0A00110;end_range=4213,.;gbkey=Gene;gene_biotype=protein_coding;locus_tag=CORT_0A00110;partial=true;start_range=.,2012
NC_018292.1 RefSeq  mRNA    2012    4213    .   -   .   ID=rna1;Parent=gene1;Dbxref=Genbank:XM_003865800.1,GeneID:14536804;Name=XM_003865800.1;end_range=4213,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,2012;transcript_id=XM_003865800.1
NC_018292.1 RefSeq  exon    2012    4213    .   -   .   ID=id48;Parent=rna1;Dbxref=Genbank:XM_003865800.1,GeneID:14536804;end_range=4213,.;gbkey=mRNA;partial=true;product=hypothetical protein;start_range=.,2012;transcript_id=XM_003865800.1
NC_018292.1 RefSeq  CDS 2012    4213    .   -   0   ID=cds1;Parent=rna1;Dbxref=InterPro:IPR001611,InterPro:IPR025875,UniProtKB/TrEMBL:H8WZ58,Genbank:XP_003865848.1,GeneID:14536804;Name=XP_003865848.1;gbkey=CDS;product=hypothetical protein;protein_id=XP_003865848.1;transl_table=12

Mapping rate is high, ca 92% of unique alignments.

But ReadsPerGene.out.tab looks like

N_unmapped  1239857 1239857 1239857
N_multimapping  24667   24667   24667
N_noFeature 342031  16030329    520585
N_ambiguous 0   0   0

When I convert gff file to gtf using gffread, everything works fine. So I am wondering what might be the problem, because this option --sjdbGTFtagExonParentTranscript Parent is supposed to make STAR work with gff file normally.

Cheers,

read-counts gff STAR • 13k views
ADD COMMENT
0
Entering edit mode

How is your ReadsPerGene.out.tab using the gtf? I may be missing something, but I didn't see anything wrong with gff alignment.

ADD REPLY
0
Entering edit mode

So this is with gff

N_unmapped      1239857 1239857 1239857
N_multimapping  24667   24667   24667
N_noFeature     342031  16030329        520585
N_ambiguous     0       0       0

This is with gtf

N_unmapped      1239857 1239857 1239857
N_multimapping  24667   24667   24667
N_noFeature     306146  16030300        484729
N_ambiguous     70354   13862   25782
gene0   17      0       17
gene1   36      0       36
gene2   462     2       460 
gene3   877     2       875
gene4   221     1       220
ADD REPLY
0
Entering edit mode

Ok, now I understand. I never used STAR with a gff annotation, I don't know if this is a bug or not, maybe it is worth opening a ticket on STAR github?

Anyway, you have a simple fix to the issue already.

ADD REPLY
0
Entering edit mode

Can you check by removing the option --sjdbGTFfile gff ?

This option defines the path of GTF file, and since there is no gtf-file in the current dir named "gff", STAR might be not be doing anything.

ADD REPLY
0
Entering edit mode

If I don't provide the gff file STAR will not count anything, obviously because it does not have info about features. Also this is just a sample command, in real life I specify the path to gff.

ADD REPLY
2
Entering edit mode
6.3 years ago

I have addressed this question to the developer of STAR, here is his answer if somebody is still interested https://groups.google.com/forum/#!searchin/rna-star/grant%7Csort:date/rna-star/yl6JRltAuG4/bbuKHQM4AgAJ

ADD COMMENT

Login before adding your answer.

Traffic: 2438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6