Question

How can I get a count of mRNA + lncRNAs from Gencode + lncipedia?

0

Entering edit mode

3.2 years ago

njk639 • 0

Hi all,

I'm working with some high-depth, 100bp PE RNA-Seq data and we'd like to look at both mRNA and lncRNA.

Right now my workflow looks like the following:

Align reads to human genome (GRCh38.primary_assembly.genome.fa) via STAR.

STAR --runMode alignReads --runThreadN 8 --genomeDir INDEX_DIR_HERE --outSAMtype BAM Unsorted --readFilesIn FASTQ_FILEPATHS_HERE
Generate count tables via featureCounts, I have been doing this twice for my annotations, once to generate a count table from the gencodev36 primary assembly annotation, and again to generate a count table from the lncipedia 5.2 annotations.

featureCounts -T 8 -a GENCODE_OR_LNCIPEDIA_GTF -t exon -s 2 -p -g gene_id -o Counts.txt BAM_FILES
I then use DESeq2 to get differential genes.

The issue I'm running into right now is cutting down on redundancy between the gencode dataset and lncipedia. Since some of the lncRNAs are also in the gencode annotations, those get included twice. I've tried using biomaRt to convert ensembl gene IDs to HGNC symbols, but this is not proving very effective as not all of the ensemble lncRNAs IDs in gencode have hgnc symbols.

What would be the easiest way for me to ensure I get accurate counts of mRNA and lncRNA in one table?

rna-seq lncRNA • 720 views

ADD COMMENT • link 3.2 years ago by njk639 • 0

0

Entering edit mode

GENCODE GTF does have lncRNA's in it. Are you excluding those during counting?

ADD REPLY • link 3.2 years ago by GenoMax 141k

0

Entering edit mode

No, I'm not sure how to exclude those from gencode. That's sort of the problem. I want the more extensive listing of the lncRNAs provided by lncipedia while still getting all the "standard" genes from Gencode.

ADD REPLY • link 3.2 years ago by njk639 • 0

0

Entering edit mode

You could simply grep -v those entries out

$ grep -v lncRNA gencode.v36.primary_assembly.annotation.gtf > gencode_minus_lncRNA.gtf

ADD REPLY • link 3.2 years ago by GenoMax 141k