Entering edit mode
7.8 years ago
ybling2008
•
0
- I want compute survival analysis between LncRNA and Breast Cancer based on TCGA. but i don't know how get the LncRNA information. 2.Should I must download the raw RNA-seq data from the TCGA database one patient by one patient?
- How to identify the LncRNA based on the RAW data
Thank you
My final purpose is to establish a model like as below. It is a paper. So I want to get LncRNA information. And then put them divide into two groups. one is train group, the other is validation group. Based Cox regression, construct a model of some lncRNA equation。
"Transcriptome sequencing uncovers a three–long noncoding RNA signature in predicting breast cancer survival"
You can still follow the tutorial mentioned above. Divide the RNAseq patients samples into two group training and the validation set.
but my question the dataset from the Tutorial you give don't covers LncRNAs.
The RNAseq data you download from TCGA will have all classes of RNAs, you just have to separate them based on their property using available annotations like, Gencode, Ensembl, UCSC etc.,
Example as discussed in these posts you can obtain the annotation file and match ensembl IDs from TCGA RNAseq data with your annotation file containing the classification of RNAs (protein_coding, lincRNAs, etc.,)
A: Identify lncRNA in list of Ensembl ID's
The easy way is to download GTF file from Gencode and follow these steps to get annotation as table format,
A: Converting gtf format to bed format
Orelse if you are familiar with UCSC table browser follow these steps,
A: I need to download a list of all human genes with their respective Esemble gene