Question: which file to use for analysis
gravatar for Learner
2.4 years ago by
Learner 220
Learner 220 wrote:


I am trying to analysis the RNA seq . After downloading the data I have three types




are these different? should I take only one types or I can use all three of them together when I do the analysis ?

for example, please have a look at this

is there someone who can tell me which types of RNA seq this is? (I mean how it is acquired and how to understand it?)


rna-seq • 712 views
ADD COMMENTlink modified 2.4 years ago by Kevin Blighe60k • written 2.4 years ago by Learner 220

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLYlink modified 20 months ago • written 21 months ago by Kevin Blighe60k
gravatar for Kevin Blighe
2.4 years ago by
Kevin Blighe60k
Kevin Blighe60k wrote:

The htseq.counts files contain raw counts and therefore provide you with maximum flexibility in terms of analysis.

FPKM and FPKM-UQ are both normalised counts, but the method of normalisation used in both has been slowly falling out of fashion. Most likely, both of these types of normalised counts would have been derived from the htseq.counts raw counts.

If you want me to simply give you advice on which to use, then my answer is htseq.counts. Read these counts into edgeR or DESeq2 and then Bob's your uncle.

Further information straight from TCGA's web domain:

Further information on processing htseq raw (and other) counts with DESeq2: Analyzing RNA-seq data with DESeq2


PS - the exact file to which you've linked is the FPKM-UQ counts for a breast cancer primary tumour sample from the TCGA-BRCA study.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Kevin Blighe60k

@Kevin Blighe do you know how to annotate them too? is there any package in python, perl, R or other programing languages ? if you also have any paper, it would help a lot . thanks

ADD REPLYlink written 2.4 years ago by Learner 220

You can do gene annotation conversions using the biomaRt package in R, but it's rarely straightforward due to some genes only being annotated in one database, or due to the existence of duplicate or redundant IDs, etc.

If you want to try this yourself, then do something like:

mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)

#Map the annotations
annots <- getBM(mart=mart,
  attributes=c("ensembl_gene_id", "hgnc_symbol", "gene_biotype", "external_gene_name", "refseq_mrna", "refseq_ncrna"),

ensembl.gene contains your Ensembl Gene IDs to convert.

ADD REPLYlink modified 16 months ago • written 2.4 years ago by Kevin Blighe60k

@Kevin Blighe I have few questions to ask. one is that can you give some definition about your code above? the first lines also I would like to know what you have done for your own newly analysis? did you also check the mutation ? if no, do you know how to find out the mutations across several samples ?

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Learner 220
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1237 users visited in the last hour