Question: GTF file problem
gravatar for johnsonn573
21 months ago by
johnsonn5730 wrote:

I'm performing an RNA-seq analysis. I'm making a count matrix using bam files and a gtf file to make a count matrix.

I wanted to compare my results using a gtf file from the hg19 ( build of the genome and the hg38 ( build of the genome.

I used GenomicFeatures and GenomicAlignments in R to create a count matrix of raw reads using the following code.

filenames=list.files() bamfiles <- BamFileList(filenames, yieldSize=2000000) se <- summarizeOverlaps(features=ebg, reads=bamfiles, mode="Union", singleEnd=TRUE, ignore.strand=TRUE) counts=assay(se)

My count matrix is completely different depending on which gtf file I use.

When I plot the statistics from DSS or DESeq results in the downstream analysis based on the gene id (i.e. ENSG00000223972), I get no correlation.

Why would I get completely different count matrices and results when I use different gtf files?

Can someone point me to good gtf files for hg19 and hg38 builds of the genome?

hg19 rna-seq hg38 gtf • 759 views
ADD COMMENTlink written 21 months ago by johnsonn5730

No wonder. You can't swap GTF files alone. You will need to realign the data against the other genome if you want to get the right results. Your chromosome names also need to match so make sure you use compatible genome builds and GTF files that go with them. You can find bundled sequence/annotation/indexes at iGenomes site.

Use featureCounts to do your counting.

ADD REPLYlink modified 21 months ago • written 21 months ago by genomax65k

I think I spotted your problem right here: you are using a GTF file.

ADD REPLYlink written 21 months ago by Matt Shirley8.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1213 users visited in the last hour