I am trying to make some interaction analysis between methylation and lncRNA with TCGA dataset. However, there is no any lncRNA existed dataset in TCGA project. What lucky is TANRIC provided lncRNA expression to TCGA cancer samples. However, their sample size is quite limited and didn't make good update as along as the increasing of the sample size in TCGA project.
My question is: is there any existed pipeline to quantify lncRNA expression level from BAM file of RNA-seq (BWA) from TCGA project.
TANRIC Contains read counts for ensembl defined lncRNAs, but also allows users to define their own lncRNA by inputting genomic coordinates. TANRIC also includes various analyses including survival analyses and allows for download of their data.
BAM files of RNA-seq in TCGA
Illumina paired-end RNA sequencing reads were aligned to GRCh37-lite genome-plus-junctions reference using BWA version 0.5.7. This reference combined genomic sequences in the GRCh37-lite assembly and exon-exon junction sequences whose corresponding coordinates were defined based on annotations of any transcripts in Ensembl (v59), Refseq and known genes from the UCSC genome browser, which was downloaded on August 19 2010, August 8 2010, and August 19 2010, respectively. Reads that mapped to junction regions were then repositioned back to the genome, and were marked with 'ZJ:Z' tags. BWA is run using default parameters, except that the option (-s) is included to disable Smith-Waterman alignment. Finally, reads failing the Illumina chastity filter are flagged with a custom script, and duplicated reads were flagged with Picard's MarkDuplicates