TSS metaprofile using Deeptools
2
2
Entering edit mode
3.7 years ago
mickey_95 ▴ 110

Hi,

I am trying to use Deeptools' computeMatrix and plotHeatmap functions to create TSS-centered metaprofiles over all genes. Ultimately, I would like to apply some additional filters, e.g. protein-coding, non-overlapping TSSs, above a certain size. But for an initial trial, I decided to just filter Ensembl's GTF annotation file for genes:

awk 'BEGIN{FS=OFS="\t"} $3 == "gene"' GRCm38.99.gtf > test1.gtf
head -n 5 GRCm38.99.gtf | cat - test1.gtf > test.gtf # for re-adding the GTF "header"

I then use computeMatrix:

computeMatrix reference-point \
  --referencePoint TSS \
  --scoreFileName input.bw \
  --regionsFileName test.gtf \
  -out TSSmeta.gz \
  --beforeRegionStartLength 2000 --afterRegionStartLength 2000 \
  --binSize 20 \
  --missingDataAsZero \
  --sortRegions no

This resulted in the following error:

RuntimeError: None of the input BED/GTF files had valid regions

From what I understand, the problem stems from the absence of transcript features in the 3rd column of the GTF file. Hence, I tried running the above computeMatrix command, but including --metagene, which resulted in the same error. I also tried setting --transcriptID gene --exonID gene with no success.

I would really appreciate help on this!

Deeptools computeMatrix GTF metaprofile • 4.0k views
ADD COMMENT
1
Entering edit mode
3.7 years ago
2nelly ▴ 310

If you can upload the first lines of your test.gtf, we can help you. Alternatively, you can try to convert gtf to bed format.

For me the bed format below works like a charm:

chr1    2985742 3355185 PRDM16  369443  +
chr1    6845384 7829766 CAMTA1  984382  +
chr1    8412464 8877699 RERE    465235  -
chr1    10696661    10856733    CASZ1   160072  -

chromosome,start,end,gene_symbol,length,strand

ADD COMMENT
0
Entering edit mode

Here are the first lines of the gtf file:

chr1    ensembl_havana  gene    5588466 5606131 .   +   .   gene_id "ENSMUSG00000025905"; gene_version "14"; gene_name "Oprk1"; gene_source "ensembl_havana"; gene_biotype "protein_coding";
chr1    ensembl_havana  gene    6206197 6276648 .   +   .   gene_id "ENSMUSG00000025907"; gene_version "14"; gene_name "Rb1cc1"; gene_source "ensembl_havana"; gene_biotype "protein_coding";
chr1    ensembl_havana  gene    6359218 6394731 .   +   .   gene_id "ENSMUSG00000087247"; gene_version "3"; gene_name "Alkal1"; gene_source "ensembl_havana"; gene_biotype "protein_coding";

I was hoping to directly use the gtf file without having to convert to a bed file to avoid having to switch between 1- and 0-based coordinates.

I have now tried:

computeMatrix reference-point \
  --referencePoint TSS \
  --scoreFileName input.bw \
  --regionsFileName test.gtf \
  -out TSSmeta.gz \
  --beforeRegionStartLength 2000 --afterRegionStartLength 2000 \
  --binSize 20 \
  --missingDataAsZero \
  --sortRegions no
  --transcriptID gene \
  --transcript_id_designator gene_id

It ran through without errors and the result from plotHeatmap looks reasonable. But now I am doubting whether setting --transcript ID gene and --transcript_id_designator gene_id is in in this specific case correct (trying to get the signal over entire genes instead of transcripts)

ADD REPLY
4
Entering edit mode

The error most likely occurred because you didn't include any entry with "transcript" in the 3rd column in your test.gtf.

But now I am doubting whether setting --transcript ID gene and --transcript_id_designator gene_id is in in this specific case correct (trying to get the signal over entire genes instead of transcripts)

Why are you in doubt? Seems like you want to focus on genes rather than transcripts.

ADD REPLY
1
Entering edit mode

If you want genes then that will work fine. Just remember that genes are just groups of transcripts when looking at the results.

ADD REPLY
0
Entering edit mode
3.7 years ago
zhuobaowen ▴ 40

You can just download a GTF file from Ensembl or UCSC and use that. computeMatrix will figure out where the TSS and TES for each transcript is then. ADD COMMENT • linkwritten 3.3 years ago by Devon Ryan ♦ 96k

ADD COMMENT

Login before adding your answer.

Traffic: 2532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6