Question: CLCBio Genomics workbench - protein coding genes only
3.0 years ago
Copenhagen, Denmark
I map RNA-seq fastq files to the mouse genome (GRCm38) using the RNA-seq Analysis tool available in CLC Genomics workbench 9. As a result I get mapping results to ~46K genes. This is including non-coding genes. I would like to only map to protein-coding genes as I'm working with sequencing data of libraries that are prepared with Illumina TruSeq kit that captures mRNA by Oligodt beads.

Does anyone have experience with RNA-seq analysis in CLC Genomics? Is there a way of only mapping protein-coding genes in CLC Genomics

Best, Annika

Since you are using a commercial product the company should support you with problems like this. But perhaps you are lucky and someone here can answer it too...

9 months ago
Rockville, MD
Annika - you can use a track file of annotation types for your reference genome to mask out / in only what you want to see. Alternatively, you can subselect the expression table results to only those feature types ("CDS" types in this case) that you want to keep and then do all your downstream analysis on subset results.

3.0 years ago
There are several polyA ncRNA, so the kit would capture them as well. Anyway, best would be to map against all genome, then later restrict the analysis to the regions of interest, to avoiding mapping biases.

As WouterDeCoster pointed, few people here use CLC software, so chances are you wont get help specifically for it. But CLC support is attentive, should you contact them I am sure they will help you quickly.

