Question: Should cndas provided by Ensembl be filtered by ccds or biotype prior to running kallisto?
0
gravatar for holgerbrandl
22 months ago by
holgerbrandl30
holgerbrandl30 wrote:

I typically download cdnas directly from Ensembl (like with wget ftp://ftp.ensembl.org/pub/release-93/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz), build a kallisto index, and run kallisto quant to estimate isoform abundance.

However, Ensembl tends to provide very detailed transcript models. Furthermore, the provided cdna files from Ensembl also contain lots of non-coding biotypes from NMD to retained intron.

So I was wondering if a better practice would be filtering the provided cdna.fasta for just those transcripts with a CCDS id or filtering by biotype (such as "protein coding")?

As an example a ccds-filter would cut down the number of cdnas of https://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000077782 from 41 to 9.

How sensitive is kallisto with respect to overly complex/redundant gene architectures?

kallisto isoforms • 368 views
ADD COMMENTlink written 22 months ago by holgerbrandl30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1346 users visited in the last hour