differential expression- coding genes
3
0
Entering edit mode
3.2 years ago
Rob ▴ 170

hello friends,

I performed differential expression analysis with DESeq2 package. my data is HT-seq count data and protein coding genes. in the result TUNAR gene is highly significantly expressed. TUNAR is a non-coding genes. I do not understand why this is in my result because I analyzed only coding genes. for sanity check, I repeated the analysis again and again and the result is the same.

How can I interpret this?

Thanks

rna-seq • 2.3k views
ADD COMMENT
1
Entering edit mode

So Rob... Now that you know that it is hard to define TUNAR as coding or not you have to tell us why it is so important for you to know!!!

ADD REPLY
0
Entering edit mode

I assume you are working on human. TUNAR is a long non-coding RNA which means that the gene is transcribed as RNA, but not translated to protein, so it is perfectly normal that you identify it.

EDIT: I just read that you stated that your data are protein coding genes. So, you should double check the input file, TUNAR is classified as a non-coding RNA.

You may also find useful to consult with biologists in your lab.

ADD REPLY
0
Entering edit mode

Thanks Fabio Marroni Actually I checked many times, but TUNAR is the top gene every time.

ADD REPLY
4
Entering edit mode
3.2 years ago
ATpoint 81k

If it is in the results and annotated as lncRNA then you did not properly filter for protein-coding genes, simple as that. How did you do the filtering?

ADD COMMENT
0
Entering edit mode

Hi ATpoint Thanks for responding.

From Biomart the TUNAR gene annotated as "protein_coding" that is weird. I used this code to filter and just keep the coding genes:

mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
mrna_attributes <- getBM(attributes=c("external_gene_name",
                                      "ensembl_gene_id",
                                      "gene_biotype"),
                         filters = c("ensembl_gene_id"),
                         values = rownames(rna),
                         mart = mart)
mrna_attributes <- mrna_attributes[which(mrna_attributes$gene_biotype == "protein_coding"),]
rna <- rna[which(rownames(rna) %in% mrna_attributes$ensembl_gene_id),]

write.csv(rna,"rnaCodinggenes.csv")

Is there any problem with this?

ADD REPLY
0
Entering edit mode

No, this looks good indeed. TUNAR here is indeed annotated as a protein_coding gene so either the annotation is wrong (unlikely) or you would need to check your source stating that this gene was non-coding.

ADD REPLY
4
Entering edit mode
3.2 years ago

Hey,

TUNAR is a protein coding gene:

biomaRt

require(biomaRt)
mart <- useMart(biomart = 'ensembl', dataset = 'hsapiens_gene_ensembl')
mrna_attributes <- getBM(
  attributes = c(
    'external_gene_name',
    'ensembl_gene_id,
    'gene_biotype'),
  mart = mart)

grep('TUNAR', mrna_attributes$external_gene_name)
[1] 34387

mrna_attributes[grep('TUNAR', mrna_attributes$external_gene_name),]
      external_gene_name ensembl_gene_id   gene_biotype
34387              TUNAR ENSG00000250366 protein_coding

Ensembl's record:

http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000250366;r=14:95876392-95925663

The latest GENCODE:

zcat gencode.v36.transcripts.fa.gz | grep -e "ENSG00000250366"
>ENST00000504119.1|ENSG00000250366.3|OTTHUMG00000171392|OTTHUMT00000413256.1|TUNAR-202|TUNAR|2930|protein_coding|
>ENST00000678517.1|ENSG00000250366.3|OTTHUMG00000171392|-|TUNAR-204|TUNAR|3373|protein_coding|
>ENST00000503525.2|ENSG00000250366.3|OTTHUMG00000171392|OTTHUMT00000413257.1|TUNAR-201|TUNAR|3197|protein_coding|
>ENST00000554321.1|ENSG00000250366.3|OTTHUMG00000171392|OTTHUMT00000413258.1|TUNAR-203|TUNAR|767|protein_coding|

zcat gencode.v36.transcripts.fa.gz | grep -e "TUNAR"
>ENST00000504119.1|ENSG00000250366.3|OTTHUMG00000171392|OTTHUMT00000413256.1|TUNAR-202|TUNAR|2930|protein_coding|
>ENST00000678517.1|ENSG00000250366.3|OTTHUMG00000171392|-|TUNAR-204|TUNAR|3373|protein_coding|
>ENST00000503525.2|ENSG00000250366.3|OTTHUMG00000171392|OTTHUMT00000413257.1|TUNAR-201|TUNAR|3197|protein_coding|
>ENST00000554321.1|ENSG00000250366.3|OTTHUMG00000171392|OTTHUMT00000413258.1|TUNAR-203|TUNAR|767|protein_coding|

Kevin

ADD COMMENT
2
Entering edit mode

It is true!!! It is simply that genecards and ENTREZ list the gene as non-coding, probably as a consequence of the first papers listing it as a non-coding.

ADD REPLY
0
Entering edit mode

Thanks Kevin, but in gene cards it is stated as: Long Intergenic Non-Protein Coding RNA ???

https://www.genecards.org/cgi-bin/carddisp.pl?gene=TUNAR

ADD REPLY
2
Entering edit mode

Given the complexity of the genome and how evolution works, every possible combination of transcription and translation will exist; thus, assuredly, some transcripts are non-coding under some conditions, but protein coding under others. It is difficult for the annotation databases to 'capture' all of these nuances of the genome.

ADD REPLY
3
Entering edit mode
3.2 years ago
EagleEye 7.5k

Hi,

So far all the studies (publications) are reporting it as long noncoding RNA. I just made a quick analysis to check TUNAR's coding probability and here are the results.

ADD COMMENT
0
Entering edit mode

Nice, EagleEye, Can you show me how you did the analysis? then I can do it with other genes. I have no idea about what you did.

ADD REPLY
3
Entering edit mode

You just have to input the sequence of RNA to the CPC tool. It is just quick and easy way to do it. I always use two or three tools like CPC/phyloCF to confirm the coding potential of the transcripts/isoforms after DE analysis (before proceeding to further downstream analysis). Because the genome annotation is becoming complicated day-by-day. Even some protein coding genes have noncoding isoforms and in rare case other way around too.

ADD REPLY
0
Entering edit mode

Thanks EagleEye, Then for pathway and other analyses, how you select the genes from DE results? what percent of being coding gene is satisfying to put it into downstream analyses? Also, how can have access to the sequence of RNA to put that into cpc tool?

ADD REPLY
1
Entering edit mode

Then for pathway and other analyses, how you select the genes from DE results? what percent of being coding gene is satisfying to put it into downstream analyses?

Let's say particular gene X has three isoforms/transcripts, Y.1, Y.2 and Y.3. From your differential expression analysis you got particular isoform/transcript Y.2 is differentially expressed but when you check the coding potential it falls in noncoding category according to CPC or phyloCF etc.,. I would eliminate gene X from downstream analysis. In other case if you get at least one of the isoform or transcript Y.1 to be protein coding, I would use that gene X for downstream analysis.

Also, how can have access to the sequence of RNA to put that into cpc tool?

You can extract the transcript sequences using Ensembl BioMart.

ADD REPLY
0
Entering edit mode

Great! Thanks a lot.

ADD REPLY

Login before adding your answer.

Traffic: 846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6