How to add gap line in lncRNA graph by using ggplot
1
0
Entering edit mode
9.4 years ago
zm20074970 • 0

Hello, everyone, it's my first time to ask a question here. I really appreciate who can give me any advice, thank you.

I try to use ggbio to create a lncRNA database(http://www.lncipedia.org/downloads/lncipedia_3_0.gtf), but I got a warning like that ""gap" is not matching to following arbitrary model terms"cds CDS Cds exon EXON Exon utr UTR Utr"", so there is no gap line in my transcripts, how to add the gaps? Besides, I don't know how to change the name of the transcripts, I want the name like "ENST00000458525" and I use the code names.expr = "transcript_alias_1", It can't work, my command is below:

txdblnc <- makeTranscriptDbFromGFF(file="lncipedia_3_0.gtf", format = 'gtf')
wh <- GRanges("chr10", IRanges(93525000, 93656000))
p4 <- autoplot(txdblnc, wh, color = "Dark Green", fill = "Dark Green")

![< image not found >1

lncRNA makeTranscriptDbFromGFF ggbio • 3.1k views
ADD COMMENT
1
Entering edit mode

You might try to contact the ggbio maintainer, either via the bioconductor support site or directly (instructions for that are shown when you load the ggbio package).

ADD REPLY
1
Entering edit mode
9.4 years ago
komal.rathi ★ 4.1k

I looked at the code of makeTranscriptDbFromGFF and it imports a lot of other functions. I tried to modify it, but couldn't find the most updated code for all the functions that makeTranscriptDbFromGFF depends on. Instead of going deeper and deeper into the code, what I observed was that the function automatically picks up columns that are marked by the name gene_id, so I modified the gtf file like this:

sed -e 's/gene/entity/g' -e 's/transcript_alias_1/gene_id/g' lncipedia_3_0.gtf > lncipedia_3_0_1.gtf

Here I have replaced every gene with entity and transcript_alias_1 with gene_id, so that now it skips the original gene_id column and looks at transcript_alias_1 column (which is now renamed as gene_id) instead. Then you can use this new gtf file as shown in the the code below, although now you have to specify names.expr='gene_id':

txdblnc = makeTranscriptDbFromGFF(file = "lncipedia_3_0_1.gtf",format = "gtf")
wh = GRanges("chr10", IRanges(93525000, 93656000))
p4 = autoplot(txdblnc, wh, color = "Dark Green", fill = "Dark Green", names.expr="gene_id")

This will generate the kind of plot you want but I would advise against it. Because a lot of values that are associated with transcript_alias_1 column, in the original GTF file, are NON-ENST values, for e.g. :

chr4    lncipedia.org    exon    9680180    9680546    .    -    .    gene_id lnc-SLC2A9-6 ; transcript_id lnc-SLC2A9-6:1 ; transcript_alias_1 l_2603_chr4:9680179-9682164_kidney ;
chrY    lncipedia.org   exon    58883380        58883510        .       +       .       gene_id lnc-PRYP4-3 ; transcript_id lnc-PRYP4-3:1 ; gene_alias_1 XLOC_008304 ; gene_alias_2 linc-SPRY3-3 ; transcript_alias_1 TCONS_00017658 ; transcript_alias_2 NONHSAT139812 ;
chr1    lncipedia.org    exon    78217436    78217681    .    +    .    gene_id lnc-FAM73A-1 ; transcript_id lnc-FAM73A-1:1 ; transcript_alias_1 expReg_chr1_10783_+ ;

And you should still contact the maintainer so that they can update their code for more flexibility.

ADD COMMENT
0
Entering edit mode

Thank you very much!!! do you know how to add the line in the gap like the graph below. I am trying to connect the maintainer, thank you your advice

< image not found >

ADD REPLY
0
Entering edit mode

zm20074970 did you read this manual? The best thing would be to contact the maintainer. I could figure out how to change the labels from Gene Names to Transcript ID but I don't know why the gaps are not working. The example data set given in the manual fills the gaps automatically. Probably you have to specify strand in the GRanges object?

ADD REPLY
0
Entering edit mode

Thank for your reply! I read the manual but I cannot find anything about it, I have specified "ignore.strand = FALSE" in the GRanges object, but it didn't work, I have to wait for the reply from the ggbio mantainer.

ADD REPLY
1
Entering edit mode

Let me know if you hear back from them. I am really curious to see what's wrong here. Anyway, I tried this to see what's wrong with your data:

# your code
txdblnc = makeTranscriptDbFromGFF(file = "lncipedia_3_0_1.gtf",format = "gtf")
wh = GRanges("chr10", IRanges(93525000, 93656000))
p4 = autoplot(txdblnc, wh, color = "Dark Green", fill = "Dark Green", names.expr="gene_id")

# from the manual
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
data(genesymbol, package = "biovizBase")
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
p1 <- autoplot(txdb, which = genesymbol["ALDOA"], names.expr = "tx_name:::gene_id")
p2 <- autoplot(txdb, which = genesymbol["ALDOA"], stat = "reduce", color = "brown",fill = "brown")
tracks(full = p1, reduce = p2, heights = c(5, 1)) + ylab("")

# difference in txdb and txdblnc
txdb.l = as.list(txdb)
txdblnc.l = as.list(txdblnc)

summary(txdb.l)
Length Class      Mode
transcripts  6     data.frame list
splicings   11     data.frame list
genes        2     data.frame list
chrominfo    3     data.frame list

summary(txdblnc.l)
Length Class      Mode
transcripts  6     data.frame list
splicings   11     data.frame list
genes        2     data.frame list
chrominfo    3     data.frame list

The only difference I could find in your data in the manual is that you don't provide a chrom.info file in the makeTranscriptDbFromGFF function which you can easily provide. Other than that, your splicings do not have any data for cds_id, cds_start and cds_end.

head(txdb.l$splicings)
tx_id exon_rank exon_id exon_name exon_chrom exon_strand exon_start exon_end cds_id cds_start cds_end
   1         1       1      <NA>       chr1           +      11874    12227     NA        NA      NA
   1         2       3      <NA>       chr1           +      12613    12721     NA        NA      NA
   1         3       5      <NA>       chr1           +      13221    14409     NA        NA      NA
   2         1       1      <NA>       chr1           +      11874    12227      1     12190   12227
   2         2       2      <NA>       chr1           +      12595    12721      2     12595   12721
   2         3       6      <NA>       chr1           +      13403    14409      3     13403   13639

head(txdblnc.l$splicings)
tx_id exon_rank exon_id exon_name exon_chrom exon_strand exon_start exon_end cds_id cds_start cds_end
   1         1       1      <NA>      chr19           +      68403    69146     NA        NA      NA
   2         1       2      <NA>      chr19           +      71161    71646     NA        NA      NA
   2         2       4      <NA>      chr19           +      72171    72274     NA        NA      NA
   2         3       5      <NA>      chr19           +      72585    72706     NA        NA      NA
   3         1       2      <NA>      chr19           +      71161    71646     NA        NA      NA
   3         2       4      <NA>      chr19           +      72171    72274     NA        NA      NA
ADD REPLY
0
Entering edit mode

Thanks for your help, I will tell you if I receive their reply, but I don't hear any back after two days╮(╯▽╰)╭

ADD REPLY
0
Entering edit mode

Hi, komal, I don't get any back from the maintainer of ggbio (yintengfei@gmail.com) nearly one week, do you know whether the email address is work or not.

ADD REPLY
0
Entering edit mode

How about creating a 'new issue' on ggbio's github page? I think you will get a quicker response there.

ADD REPLY

Login before adding your answer.

Traffic: 1524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6