Annotations with biomart for ballgown
16 months ago
schlogl ▴ 80

Hi guys sorry I am here again, but I try for sure to look for some answer before I come here bother you all. I following a tutorial to learn to work with R and some RNA-Seq data and time to time I have to face a different problem. Until now I have being dealing with Biomart annotation and got some error to find the exact attributes and stuff, but I got everything ok looking for at google.

But at the last part of the annotation I got to face that I didn't get any annotated genes.

This is the head of the merged gtf file that I got with stringtie2:

Chr1    StringTie   transcript  3631    5899    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; gene_name "AT1G01010.TAIR10"; xloc "XLOC_000001"; ref_gene_id "AT1G01010.TAIR10"; cmp_ref "AT1G01010.1.TAIR10"; class_code "="; tss_id "TSS1";
Chr1    StringTie   exon    3631    3913    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "1";
Chr1    StringTie   exon    3996    4276    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "2";
Chr1    StringTie   exon    4486    4605    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "3";
Chr1    StringTie   exon    4706    5095    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "4";
Chr1    StringTie   exon    5174    5326    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "5";
Chr1    StringTie   exon    5439    5899    .   +   .   transcript_id "AT1G01010.1.TAIR10"; gene_id "MSTRG.1"; exon_number "6";

And it looks very similar with the file from the guys that put out the tutorial. But at the and I got none annotated genes.

g_sign<-subset(results_genes_no_filter,pval<0.01 & abs(log2fc)>0.584)

> head(g_sign)
     feature               id           fc         pval      qval    log2fc
125     gene AT1G04610.TAIR10 5.611861e+01 0.0001284833 0.5662515  5.810407
128     gene AT1G04660.TAIR10 3.523963e+01 0.0021592230 0.5662515  5.139127
205     gene AT1G07450.TAIR10 2.100165e+00 0.0021455935 0.5662515  1.070503
220     gene AT1G07985.TAIR10 1.037975e+02 0.0002824740 0.5662515  6.697627
676     gene AT1G21830.TAIR10 4.241355e+00 0.0024521123 0.5662515  2.084525
1003    gene AT1G31360.TAIR10 4.043344e+05 0.0057775911 0.5662515 18.625189

I got this well after looking for some adjusts

    #### ANNOTATION - BioMart ########


    mart=useMart("plants_mart", host="")

    mart <- useMart("plants_mart", dataset="athaliana_eg_gene", host="")
    searchAttributes(mart = mart, pattern = "ensembl_gene_id")
    listAttributes(mart = mart, page="feature_page")
getBM(attributes=c("ensembl_gene_id”,”ensembl_transcript_id”,”ensembl_peptide_id”,"ensembl_exon_id" ,”description"),mart=thale_mart)

    ## Now match the genes from our list to this dataset
    annotated_genes = subset(thale_data_frame, ensembl_gene_id %in% g_sign$id)
    dim(annotated_genes )
[1] 0 3

There are anyway to fix it?

I follow the steps and had similar results, however the last thing that I could't fix was that my Biomart package (2.40.5) is old for my R version, but I tried to update without good exit.

> version
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
major          3                           
minor          6.1                         
year           2019                        
month          07                          
day            05                          
svn rev        76782                       
language       R                           
version.string R version 3.6.1 (2019-07-05)
nickname       Action of the Toes

Any suggestions or directions?

Thank you for your time! Paulo

rna-seq R • 400 views
No one can share any ideas, suggestions, ... 8(

16 months ago
schlogl ▴ 80

For who needs futures answers to the same question.


