Full GTF file vs Subset GTF file
1
0
Entering edit mode
3.8 years ago
Arindam Ghosh ▴ 510

I have aligned raw RNA-seq reads to the Ensembl reference genome. I intend to quantify the expression using FeatureCounts of only say lincRNAs. What would be a better approach, use the full GTF file containing all types of RNAs or create a GTF containing only lincRNA and then use as input for FeatureCounts?

I tried both these approaches. For protein coding and lncRNA, the results were similar but a huge difference in case of miRNA.

featurecounts RNA-Seq miRNA-seq Ensembl • 2.6k views
ADD COMMENT
1
Entering edit mode

the results were similar but a huge difference in case of miRNA.

What do you mean by that? miRNA's being small are likely to multi-map. You should be using a specific pipeline meant for miRNA, if you have that data. Normal mRNA protocols will generally not capture miRNA's.

ADD REPLY
0
Entering edit mode

Actually I tried with miRNA-seq data, aligned them to the reference genome and then in featureCounts used the full GTF (containing protein coding, lncRNA etc) and miRNA GTF.

The miRNA GTF was created using:

grep -E '#|gene_biotype "miRNA"' Homo_sapiens.GRCh38.84.gtf > Homo_sapiens.GRCh38.84.miRNA.gtf

Even I suspect the difference might be due to multi-mapping.

Most paper I came across usually use miRBase reference and annotaion for miRNA-seq analysis. But, I was insisting on using the Ensembl GTF file as it contains miRBase annotations for mIRNA.

ADD REPLY
0
Entering edit mode

aligned them to the reference genome

Which program did you use? miRNA's need un-gapped alignments.

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
3.8 years ago
Shalu Jhanwar ▴ 520

I think the difference in the full vs subset GTF depends on how the GTF is being subset from the full GTF, and not at FeatureCounts step. Do you use a filter on Biotype to get GTF of all miRNAs? More information on Biotype is at http://www.ensembl.org/info/genome/genebuild/biotypes.html.

ADD COMMENT
0
Entering edit mode

Create subset of Ensembl GTF file based on gene biotype

grep -E '#|gene_biotype "miRNA"' Homo_sapiens.GRCh38.84.gtf > Homo_sapiens.GRCh38.84.miRNA.gtf
ADD REPLY
0
Entering edit mode

I'd recommend extracting GTF with miRNA biotype by filtering in the specific column (e.g. using awk), instead of using grep on entire lines. For e.g. for a Gencode gtf, extract miRNA gtf as:

zcat gencode.v19.long_noncoding_RNAs.gtf | awk '{if ($20!~"miRNA") print $}' | sort | uniq | > miRNA.gtf
ADD REPLY
0
Entering edit mode

Anyway is this a logical way? Should this miRNA.gtf be used for read quantification?

ADD REPLY

Login before adding your answer.

Traffic: 2393 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6