Question: Gtf File Errors When Using Easyrnaseq To Produce Count Tables.
0
gravatar for e.karasmani
5.8 years ago by
e.karasmani120
e.karasmani120 wrote:

Dear All,

Hello, as I am not an expert I would like your help with my problem.....

I have used tophat to allign a fastq from RNA-seq experiment and then I wanted to make a count table to use it in DESeq. Hence I tried to use easyRNAseq for that purpose but I have a problem.....

here is my code....

library(easyRNASeq)
library(BSgenome.Mmusculus.UCSC.mm9)


dataDir <- "/data/lena/m"
file.exists(dataDir)
stopifnot(file.exists(dataDir))
list.files(dataDir)


gtf <- "/home/lena/mm9_IlluminaAnnotation_genes.gtf"
rna <- "accepted_hits.bam"


 c_ers <- easyRNASeq(organism = "Mmusculus", 
               chr.sizes = as.list(seqlengths(Mmusculus)),
               annotationMethod = "gtf", annotationFile= gtf, 
               format = "bam", count = "genes", 
               summarization = "geneModels",outputFormat="DESeq", 
               filenames = rna, filesDirectory = dataDir
               )

and here is what i get

Checking arguments...
Fetching annotations...
Read 566539 records
Error in .getGtfRange(organismName(obj), filename = filename, ignoreWarnings = ignoreWarnings,  :
 Your gtf file: /home/lena/mm9_IlluminaAnnotation_genes.gtf does not contain all the required fields: gene_id, transcript_id, exon_number, gene_name                                                                                                                      .
In addition: Warning messages:
1: The use of the list for providing chromosome sizes has been deprecated. Use a named numeric vector instead.
   2: In .Method(..., deparse.level = deparse.level) :
 number of columns of result is not a multiple of vector length (arg 17)

Hence what am I doing wrong? What should I do ??? Could you please advise me?

This is how my Gtf file looks like

chr1    unknown    exon    3204563    3207049    .    -    .    gene_id "Xkr4"; gene_name "Xkr4"; p_id "P2671"; transcript_id "NM_001011874"; tss_id "TSS1758";

 chr1    unknown    stop_codon    3206103    3206105    .    -    .    gene_id "Xkr4"; gene_name "Xkr4"; p_id "P2671"; transcript_id "NM_001011874"; tss_id "TSS1758";
 chr1    unknown    CDS    3206106    3207049    .    -    2    gene_id "Xkr4"; gene_name "Xkr4"; p_id "P2671"; transcript_id "NM_001011874"; tss_id "TSS1758";

Also I tried to use the biomart way....

so that is what I did.....

ensembl=useMart(host='may2009.archive.ensembl.org', biomart='ENSEMBL_MART_ENSEMBL', dataset='mmusculus_gene_ensembl')
 ensembl.genes <- getBM(attributes = c('chromosome_name','start_position'
,'end_position','ensembl_gene_id','external_gene_id','strand'), mart = ensembl)


c_ers <- easyRNASeq(organism = "Mmusculus", 
               chr.sizes = as.list(seqlengths(Mmusculus)),
               annotationMethod = "biomaRt", annotationFile= ensembl.genes, 
               format = "bam", count = "genes", 
               summarization = "geneModels",outputFormat="DESeq", 
               filenames = rna, filesDirectory = dataDir
               )


Checking arguments...
 Fetching annotations...
 Error in easyRNASeq(organism = "Mmusculus", chr.sizes = as.list(seqlengths(Mmusculus)),  :
  The number of conditions: 0 did not correspond to the number of samples: 1
   In addition: Warning message:
   The use of the list for providing chromosome sizes has been deprecated. Use a named numeric vector instead.

Could you please help me....I am stucked and don't know what to do.

Thank you in advance!

I will appreciate your answers

Best regards Lena

R rna • 2.3k views
ADD COMMENTlink modified 22 months ago by Biostar ♦♦ 20 • written 5.8 years ago by e.karasmani120
0
gravatar for Istvan Albert
5.8 years ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

The error message appears to be self explanatory:

Your gtf file: /home/lena/mm9_IlluminaAnnotation_genes.gtf does not contain all the required fields: gene_id, transcript_id, exon_number, gene_name

And indeed your GTF file that you list below does not contain all these fields.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Istvan Albert ♦♦ 79k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2417 users visited in the last hour