Question: GTF/GFF file for feature count
0
gravatar for ####
3.6 years ago by
####200
####200 wrote:

I am using GFF file for feature count to produce counts for RNA-Seq analysis and the organism is non-model organism, while calculating counts I am unable to get the proper counts and as the assembly is not good and the gff

  #!genome-build RproC3                                                           
  #!genome-version RproC3                                                         
  #!genome-date 2015-04                                                           
  #!genome-build-accession GCA_000181055.3                                                                
KQ034291        VectorBase      gene    36335   45838   0       +       0       gene_id "RPRC000679";"
KQ034291        VectorBase      transcript      36335   45838   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA";"
KQ034291        VectorBase      exon    36335   36356   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "1";"
KQ034291        VectorBase      CDS     36335   36356   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "1";"
KQ034291        VectorBase      exon    40565   40684   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "2";"
KQ034291        VectorBase      CDS     40565   40684   0       +       2       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "2";"
KQ034291        VectorBase      exon    40763   40941   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "3";"
KQ034291        VectorBase      CDS     40763   40941   0       +       2       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "3";"
KQ034291        VectorBase      exon    45833   45838   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "4";"
KQ034291        VectorBase      CDS     45833   45835   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "4";"
KQ034291        VectorBase      stop_codon      45836   45838   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "4";"
KQ034291        VectorBase      gene    48738   55400   0       -       0       gene_id "RPRC003242";"
KQ034291        VectorBase      transcript      48738   55400   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA";"
KQ034291        VectorBase      exon    55216   55400   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "1";"
KQ034291        VectorBase      CDS     55216   55289   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "1";"
KQ034291        VectorBase      start_codon     55287   55289   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "1";"
KQ034291        VectorBase      exon    53297   53592   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "2";"
KQ034291        VectorBase      CDS     53297   53592   0       -       1       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "2";"
KQ034291        VectorBase      exon    52421   52605   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "3";"
KQ034291        VectorBase      CDS     52421   52605   0       -       2       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "3";"
KQ034291        VectorBase      exon    51858   51907   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "4";"
KQ034291        VectorBase      CDS     51858   51907   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "4";"
KQ034291        VectorBase      exon    51146   51248   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "5";"
KQ034291        VectorBase      CDS     51146   51248   0       -       1       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "5";"
KQ034291        VectorBase      exon    50189   50352   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "6";"
KQ034291        VectorBase      CDS     50189   50352   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "6";"
KQ034291        VectorBase      exon    48738   48965   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "7";"
KQ034291        VectorBase      CDS     48884   48965   0       -       1       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "7";

"

where the first column id is same for all the genes and coz of which the count file contains the id "KQ034291" repeatedly and nothing else. However, I want to have the gtf/gff file with gene names like RPRC00679,RPRC003242 and so on , so that it shall help me to get unique gene counts , is there a way to do this?

rna-seq gff/gtf genecounts • 5.0k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by ####200

First column should refer to chromosome name, which in your case seems to be KQ034291. I am not sure why you have (line numbers?) before that name. Where did you acquire this file from?

ADD REPLYlink written 3.6 years ago by genomax90k

I am also not sure but it was download from database. However I can get rid of it. But can I have the gene name instead of scaffold id in the first column?

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by ####200

You can but then file will not be in GTF/GFF format. featureCounts should understand the gene_id attribute in the file you posted.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by genomax90k

YEs it will recognise at the sequences for alignment used will have the same gene_id.....so i want to know how to do that?

ADD REPLYlink written 3.6 years ago by ####200

Only after you fix the first column (chromosome names need to match your alignment file). Have you looked at the manual/in-line help for featureCounts? The two options you want to pay attention to are

 -t <string>         Specify feature type in GTF annotation. `exon' by 
                      default. Features used for read counting will be 
                      extracted from annotation using the provided value.

  -g <string>         Specify attribute type in GTF annotation. `gene_id' by 
                      default. Meta-features used for read counting will be 
                      extracted from annotation using the provided value.
ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by genomax90k

I am aware about these two options you have mentioned, I have edited the gtf file mentioned above, I am getting following warning while running featureCounts with no output file:

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id' 
The attributes included in your GTF annotation are 'gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "1";"' 

||    Features : 91569                                                        ||
||    Meta-features : 1                                                       ||
||    Chromosomes/contigs : 16843                                             ||
||

According to which 9th column has some problem, which is not the real case. As I also did cut-f 9 *.gtf and here is the output :

gene_id "RPRC009988";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "1";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "1";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "1";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "2";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "2";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "3";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "3";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "4";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "4";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "5";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "5";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "6";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "6";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "7";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "7";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "7";"

So I have no clue what is going wrong here , any idea??

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by ####200

Closing a post is not an appropriate action when a question has been answered (geneally mods use that action to close posts deemed inappropriate/duplicate etc). You should accept an answer (green check mark) (moved @Devon's post to an answer) to indicate this question has been answered.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by genomax90k
1
gravatar for Devon Ryan
3.6 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

All of your lines end with an extra ". Try removing it.

ADD COMMENTlink written 3.6 years ago by Devon Ryan96k

Devon Thanks, along with " there was a wide space as well after removing both it worked.Thanks

ADD REPLYlink written 3.6 years ago by ####200

Hi, I have a similar problem with my GTF file. Feature count is giving the error "failed to find the gene identifier attribute in the 9th column of the provided GTF file." Please kindly if you tell me how to remove the " and wide space. Arumoy.

ADD REPLYlink written 8 days ago by chatterjee.arumoy0
sed "s/''$//" foo.gtf > foo.fixed.gtf

Note that the middle ticks are two apostrophes, not a ". Assuming you have EXACTLY the problem faced in the original post then that will fix it.

ADD REPLYlink written 9 hours ago by Devon Ryan96k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1532 users visited in the last hour