Question: Full GTF file vs Subset GTF file
0
gravatar for Arindam Ghosh
7 months ago by
Arindam Ghosh350
Finland
Arindam Ghosh350 wrote:

I have aligned raw RNA-seq reads to the Ensembl reference genome. I intend to quantify the expression using FeatureCounts of only say lincRNAs. What would be a better approach, use the full GTF file containing all types of RNAs or create a GTF containing only lincRNA and then use as input for FeatureCounts?

I tried both these approaches. For protein coding and lncRNA, the results were similar but a huge difference in case of miRNA.

ADD COMMENTlink modified 7 months ago by Shalu Jhanwar490 • written 7 months ago by Arindam Ghosh350
1

the results were similar but a huge difference in case of miRNA.

What do you mean by that? miRNA's being small are likely to multi-map. You should be using a specific pipeline meant for miRNA, if you have that data. Normal mRNA protocols will generally not capture miRNA's.

ADD REPLYlink written 7 months ago by GenoMax96k

Actually I tried with miRNA-seq data, aligned them to the reference genome and then in featureCounts used the full GTF (containing protein coding, lncRNA etc) and miRNA GTF.

The miRNA GTF was created using:

grep -E '#|gene_biotype "miRNA"' Homo_sapiens.GRCh38.84.gtf > Homo_sapiens.GRCh38.84.miRNA.gtf

Even I suspect the difference might be due to multi-mapping.

Most paper I came across usually use miRBase reference and annotaion for miRNA-seq analysis. But, I was insisting on using the Ensembl GTF file as it contains miRBase annotations for mIRNA.

ADD REPLYlink modified 7 months ago • written 7 months ago by Arindam Ghosh350

aligned them to the reference genome

Which program did you use? miRNA's need un-gapped alignments.

ADD REPLYlink written 7 months ago by GenoMax96k

Bowtie2 with vsl (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4931105/)

ADD REPLYlink written 7 months ago by Arindam Ghosh350
1
gravatar for Shalu Jhanwar
7 months ago by
Shalu Jhanwar490
Switzerland
Shalu Jhanwar490 wrote:

I think the difference in the full vs subset GTF depends on how the GTF is being subset from the full GTF, and not at FeatureCounts step. Do you use a filter on Biotype to get GTF of all miRNAs? More information on Biotype is at http://www.ensembl.org/info/genome/genebuild/biotypes.html.

ADD COMMENTlink written 7 months ago by Shalu Jhanwar490

Create subset of Ensembl GTF file based on gene biotype

grep -E '#|gene_biotype "miRNA"' Homo_sapiens.GRCh38.84.gtf > Homo_sapiens.GRCh38.84.miRNA.gtf
ADD REPLYlink modified 7 months ago • written 7 months ago by Arindam Ghosh350

I'd recommend extracting GTF with miRNA biotype by filtering in the specific column (e.g. using awk), instead of using grep on entire lines. For e.g. for a Gencode gtf, extract miRNA gtf as:

zcat gencode.v19.long_noncoding_RNAs.gtf | awk '{if ($20!~"miRNA") print $}' | sort | uniq | > miRNA.gtf
ADD REPLYlink written 7 months ago by Shalu Jhanwar490

Anyway is this a logical way? Should this miRNA.gtf be used for read quantification?

ADD REPLYlink written 7 months ago by Arindam Ghosh350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1640 users visited in the last hour
_