Question: How to separate htseq-count table to coding RNAs and non-coding RNAs
1
gravatar for Naresh D J
3.4 years ago by
Naresh D J60
Turku/BTK
Naresh D J60 wrote:

Hi,

I have generated the raw read counts for genes from RNA-seq data using htseq-count. Now I want to separate the this table into coding RNAs and non-coding RNAs. 

I am new to the NGS data analysis.

Can anyone help me or suggest me ideas how to do it?

Thank you,

Naresh 

ADD COMMENTlink modified 3.4 years ago by tiago2112871.1k • written 3.4 years ago by Naresh D J60

What you mean by coding and non-coding RNAs? Do you mean separating counts for coding and non-coding transcripts ? Or do you mean separating counts for coding (exonic) and non-coding (intronic, UTRs) regions for a given transcript? 

ADD REPLYlink written 3.4 years ago by Ashutosh Pandey11k

@Ashutosh Pandey, yes I want to separate the counts for coding and non-coding transcripts.

For separation of coding and non-coding regions there is a tool RSeQC.

ADD REPLYlink written 3.4 years ago by Naresh D J60

Well RSeQC will give you the numbers or fractions of reads aligned to different genic features but it won't separate them. Anyways, what you need is the annotation of transcripts (genes) based on their biotypes. If these are ENSEMBL genes or gene IDs then you can use Biomart (http://www.ensembl.org/biomart) to download the "Biotype" for each gene and then annotate ENSEMBL genes in the count file as protein-coding, rRNA, tRNA, snoRNA, miRNA etc.  

ADD REPLYlink written 3.4 years ago by Ashutosh Pandey11k

Thank you. I will try your suggestion and let you know.

ADD REPLYlink written 3.4 years ago by Naresh D J60
2
gravatar for tiago211287
3.4 years ago by
tiago2112871.1k
USA
tiago2112871.1k wrote:

What is your organism model?

If you are using some genome from ensembl, and used the gtf file with the set of anotations in the HTSeq-count, you can import all the tables with counts in txt files inside a data.frame in R.

with bioconductor do:

biocLite("biomaRt")

with this package you can get from the ensembl, a dataset based in several filters, for example, the biotype ( if it is coding or noncoding.)

Then you can simple merge the two tables based in the ensembl ID's and separate them based in your criteria. If you do not want to use R, the ensemble has a graphic web interface in(http://www.ensembl.org/biomart), although I recommend R, because will be more easy later to create better graphics and statistics.

links:

biomaRt manual

bioconductor website

R website

ps: biomaRt also can handle Uniprot and HapMap databases

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by tiago2112871.1k

Thank you. I will try your suggestion and let you know.

ADD REPLYlink written 3.4 years ago by Naresh D J60
1
gravatar for Antonio R. Franco
3.4 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.0k wrote:

To do so, you need a file with a relation (range of bases) of the sequences that are coding and not coding. Mapping reads to the reference genome or transcriptome is not aware of this information

ADD COMMENTlink written 3.4 years ago by Antonio R. Franco4.0k

@Antonio R, Franco, can you kindly elaborate your thoughts.

ADD REPLYlink written 3.4 years ago by Naresh D J60

What other information would you want?

ADD REPLYlink written 10 weeks ago by SmallChess470

What other information would you want?

ADD REPLYlink written 10 weeks ago by SmallChess470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 822 users visited in the last hour