Question: GTF/GFF file for feature count
0
gravatar for ####
2.6 years ago by
####190
####190 wrote:

I am using GFF file for feature count to produce counts for RNA-Seq analysis and the organism is non-model organism, while calculating counts I am unable to get the proper counts and as the assembly is not good and the gff

  #!genome-build RproC3                                                           
  #!genome-version RproC3                                                         
  #!genome-date 2015-04                                                           
  #!genome-build-accession GCA_000181055.3                                                                
KQ034291        VectorBase      gene    36335   45838   0       +       0       gene_id "RPRC000679";"
KQ034291        VectorBase      transcript      36335   45838   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA";"
KQ034291        VectorBase      exon    36335   36356   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "1";"
KQ034291        VectorBase      CDS     36335   36356   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "1";"
KQ034291        VectorBase      exon    40565   40684   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "2";"
KQ034291        VectorBase      CDS     40565   40684   0       +       2       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "2";"
KQ034291        VectorBase      exon    40763   40941   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "3";"
KQ034291        VectorBase      CDS     40763   40941   0       +       2       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "3";"
KQ034291        VectorBase      exon    45833   45838   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "4";"
KQ034291        VectorBase      CDS     45833   45835   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "4";"
KQ034291        VectorBase      stop_codon      45836   45838   0       +       0       gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "4";"
KQ034291        VectorBase      gene    48738   55400   0       -       0       gene_id "RPRC003242";"
KQ034291        VectorBase      transcript      48738   55400   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA";"
KQ034291        VectorBase      exon    55216   55400   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "1";"
KQ034291        VectorBase      CDS     55216   55289   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "1";"
KQ034291        VectorBase      start_codon     55287   55289   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "1";"
KQ034291        VectorBase      exon    53297   53592   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "2";"
KQ034291        VectorBase      CDS     53297   53592   0       -       1       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "2";"
KQ034291        VectorBase      exon    52421   52605   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "3";"
KQ034291        VectorBase      CDS     52421   52605   0       -       2       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "3";"
KQ034291        VectorBase      exon    51858   51907   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "4";"
KQ034291        VectorBase      CDS     51858   51907   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "4";"
KQ034291        VectorBase      exon    51146   51248   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "5";"
KQ034291        VectorBase      CDS     51146   51248   0       -       1       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "5";"
KQ034291        VectorBase      exon    50189   50352   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "6";"
KQ034291        VectorBase      CDS     50189   50352   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "6";"
KQ034291        VectorBase      exon    48738   48965   0       -       0       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "7";"
KQ034291        VectorBase      CDS     48884   48965   0       -       1       gene_id "RPRC003242"; transcript_id "RPRC003242-RA"; exon_number "7";

"

where the first column id is same for all the genes and coz of which the count file contains the id "KQ034291" repeatedly and nothing else. However, I want to have the gtf/gff file with gene names like RPRC00679,RPRC003242 and so on , so that it shall help me to get unique gene counts , is there a way to do this?

rna-seq gff/gtf genecounts • 3.4k views
ADD COMMENTlink modified 2.5 years ago • written 2.6 years ago by ####190

First column should refer to chromosome name, which in your case seems to be KQ034291. I am not sure why you have (line numbers?) before that name. Where did you acquire this file from?

ADD REPLYlink written 2.6 years ago by genomax71k

I am also not sure but it was download from database. However I can get rid of it. But can I have the gene name instead of scaffold id in the first column?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by ####190

You can but then file will not be in GTF/GFF format. featureCounts should understand the gene_id attribute in the file you posted.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by genomax71k

YEs it will recognise at the sequences for alignment used will have the same gene_id.....so i want to know how to do that?

ADD REPLYlink written 2.6 years ago by ####190

Only after you fix the first column (chromosome names need to match your alignment file). Have you looked at the manual/in-line help for featureCounts? The two options you want to pay attention to are

 -t <string>         Specify feature type in GTF annotation. `exon' by 
                      default. Features used for read counting will be 
                      extracted from annotation using the provided value.

  -g <string>         Specify attribute type in GTF annotation. `gene_id' by 
                      default. Meta-features used for read counting will be 
                      extracted from annotation using the provided value.
ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by genomax71k

I am aware about these two options you have mentioned, I have edited the gtf file mentioned above, I am getting following warning while running featureCounts with no output file:

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id' 
The attributes included in your GTF annotation are 'gene_id "RPRC000679"; transcript_id "RPRC000679-RA"; exon_number "1";"' 

||    Features : 91569                                                        ||
||    Meta-features : 1                                                       ||
||    Chromosomes/contigs : 16843                                             ||
||

According to which 9th column has some problem, which is not the real case. As I also did cut-f 9 *.gtf and here is the output :

gene_id "RPRC009988";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "1";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "1";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "1";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "2";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "2";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "3";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "3";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "4";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "4";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "5";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "5";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "6";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "6";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "7";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "7";"
gene_id "RPRC009988"; transcript_id "RPRC009988-RA"; exon_number "7";"

So I have no clue what is going wrong here , any idea??

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by ####190

Closing a post is not an appropriate action when a question has been answered (geneally mods use that action to close posts deemed inappropriate/duplicate etc). You should accept an answer (green check mark) (moved @Devon's post to an answer) to indicate this question has been answered.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by genomax71k
1
gravatar for Devon Ryan
2.5 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

All of your lines end with an extra ". Try removing it.

ADD COMMENTlink written 2.5 years ago by Devon Ryan92k

Devon Thanks, along with " there was a wide space as well after removing both it worked.Thanks

ADD REPLYlink written 2.5 years ago by ####190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 813 users visited in the last hour