Question: GTF file problem
gravatar for johnsonn573
2.6 years ago by
johnsonn5730 wrote:

I'm performing an RNA-seq analysis. I'm making a count matrix using bam files and a gtf file to make a count matrix.

I wanted to compare my results using a gtf file from the hg19 ( build of the genome and the hg38 ( build of the genome.

I used GenomicFeatures and GenomicAlignments in R to create a count matrix of raw reads using the following code.

filenames=list.files() bamfiles <- BamFileList(filenames, yieldSize=2000000) se <- summarizeOverlaps(features=ebg, reads=bamfiles, mode="Union", singleEnd=TRUE, ignore.strand=TRUE) counts=assay(se)

My count matrix is completely different depending on which gtf file I use.

When I plot the statistics from DSS or DESeq results in the downstream analysis based on the gene id (i.e. ENSG00000223972), I get no correlation.

Why would I get completely different count matrices and results when I use different gtf files?

Can someone point me to good gtf files for hg19 and hg38 builds of the genome?

hg19 rna-seq hg38 gtf • 1.1k views
ADD COMMENTlink written 2.6 years ago by johnsonn5730

No wonder. You can't swap GTF files alone. You will need to realign the data against the other genome if you want to get the right results. Your chromosome names also need to match so make sure you use compatible genome builds and GTF files that go with them. You can find bundled sequence/annotation/indexes at iGenomes site.

Use featureCounts to do your counting.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by genomax78k

I think I spotted your problem right here: you are using a GTF file.

ADD REPLYlink written 2.6 years ago by Matt Shirley9.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2220 users visited in the last hour