Question: Error for reading .gtf file in R
1
gravatar for modarzi
2.7 years ago by
modarzi140
modarzi140 wrote:

Hi, For my TCGA data set, I need to download gencode.v22.annotation.gtf.gz. For this purpose, I install “refGenome” package and run below codes in windows and Linux platforms in R:

setwd("E:/GTF")
library (refGenome)
# create ensemblGenome object for storing Ensembl genomic annotation data
ens <- ensemblGenome()
# read GTF file into ensemblGenome object
read.gtf(ens, "gencode.v22.annotation.gtf")

When I want to read .gtf file in RStudio in windows my R crashed and I have to restart RStudio again. Also In Linux I receive this message:

The application R has closed unexpectedly. By clicking on “show detail” bottom, I see this message:
R crashed with SIGABRT in __gnu_cxx::__verbose_terminate_handler

In addition, I downloaded gencode.v22.annotation.gtf.gz from 2 sources:

1- https://api.gdc.cancer.gov/data/fe1750e4-fc2d-4a2c-ba21-5fc969a24f27

2- https://www.encodeproject.org/files/gencode.v22.annotation/@@download/gencode.v22.annotation.gtf.gz

I appreciate if anybody share his/her comment with me.

Best regards,

Mohammad

ADD COMMENTlink modified 2.7 years ago by kristoffer.vittingseerup3.5k • written 2.7 years ago by modarzi140
1

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 2.7 years ago by WouterDeCoster45k

I think this issue is same as: read .gtf in R. Instead of Rstudio, try the code in R console.

ADD REPLYlink written 2.7 years ago by cpad011214k

Thanks but "read.gtf in R" also was for me.anyway, can I use another version of annotation? for example latest version? or I have to use just that one? thanks from you if you share your comment with me. Best Regards, Mohammad

ADD REPLYlink written 2.7 years ago by modarzi140

Thanks but "read.gtf in R" also was for me.anyway, can I use another version of annotation? for example latest version? or I have to use just that one? thanks from you if you share your comment with me. Best Regards, Mohammad

ADD REPLYlink written 2.7 years ago by modarzi140
1

It seems program has a trouble in reading gtf from gencode (both primary and main). I checked both v22 and v28 (latest). Alternate is to use gtf from Ensembl:. If you don't want to use ensembl annotations, i have provided another way to use the gencode gtf. For using Ensembl annotation, Download gtf file from ensembl: (ftp://ftp.ensembl.org/pub/release-92/gtf/homo_sapiens). File: Homo_sapiens.GRCh38.92.gtf.gz. Unzip before you load into package. It worked on my machine.

> library(refGenome)
Loading required package: doBy
Loading required package: RSQLite

> ens <- ensemblGenome()      

> read.gtf(ens, "Homo_sapiens.GRCh38.92.gtf")
[read.gtf.refGenome] Reading file 'Homo_sapiens.GRCh38.92.gtf'.
[GTF]  2689571 lines processed.
[read.gtf.refGenome] Extracting genes table.
[read.gtf.refGenome] Found 58,395 gene records.
[read.gtf.refGenome] Finished.

If you want to use Gencode annotations only, there is a round about way. R /lRefGenome are crashing while loading gtf file from gencode. Instead, you can

  1. Download gff3 file from gencode (https://www.gencodegenes.org/releases/current.html)
  2. Install gffread from biocondia (current version 0.99)
  3. Run following command: gffread gencode.v22.annotation.gff3-T -o my.gtf (Note: For test sake, I haven't used parameters to produce desired gtf. Please use correct parambers to get gtf with desired features. gfftead -h will print options. I named output gtf as my.gtf)
  4. Now you can load this gtf in to R without any issues; Following is the test output:
>  library(refGenome)
Loading required package: doBy
Loading required package: RSQLite
>  ens <- ensemblGenome()   
> read.gtf(ens, "my.gtf")
[read.gtf.refGenome] Reading file 'my.gtf'.
[GTF]  1872314 lines processed.
[read.gtf.refGenome] Extracting genes table.
[read.gtf.refGenome] Finished.
> q()
  
ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by cpad011214k

Thanks. I Download gff3 file from gencode (https://www.gencodegenes.org/releases/current.html) but I can't install gffread in R.

> biocLite("gffread")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.0 (2018-04-23).
Installing package(s) ‘gffread’
Warning message:
package ‘gffread’ is not available (for R version 3.5.0)

could you please that how can I install that package in R?

Best Regards, Mohammad

ADD REPLYlink written 2.7 years ago by modarzi140
2

gffread is not available in R. It is a system application. There are several ways to install:

  1. from here: https://github.com/gpertea/gffread. Follow the installation instructions.
  2. from here: http://ccb.jhu.edu/software/stringtie/dl/gffread-0.9.12.Linux_x86_64.tar.gz. This is a binary. Just keep it in your path and test it
  3. If you are using ubuntu, try sudo apt install cufflinks (for this you should have sudo permissions). This would install gffread along with cufflinks.
  4. If you have conda/miniconda installed on your machine, conda install gffread would install gffread.
ADD REPLYlink written 2.7 years ago by cpad011214k
0
gravatar for kristoffer.vittingseerup
2.7 years ago by
European Union
kristoffer.vittingseerup3.5k wrote:

Try rtracklayer::import()

ADD COMMENTlink written 2.7 years ago by kristoffer.vittingseerup3.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 979 users visited in the last hour
_