Question: gene_bodyCoverage.py No usable output...why?
0
gravatar for crcarroll
3 months ago by
crcarroll10
crcarroll10 wrote:

I am trying to use geneBody_coverage.py -r Homo_sapiens.GRCh38.97.bed -i my_file.sorted.bam -o testOutput but it generates no results in the output. As soon as I run it, I get a wall of text that looks like this:

[NOTE:input bed must be 12-column] skipped this line: Y        26600890        26601022        ENSG00000237917 .       -       havana  exon    .       gene_id "ENSG00000237917"; gene_version "1"; transcript_id "ENST00000435945"; transcript_version "1"; exon_number "12"; gene_name "PARP4P1"; gene_source "havana"; gene_biotype "unprocessed_pseudogene"; transcript_name "PARP4P1-201"; transcript_source "havana"; transcript_biotype "unprocessed_pseudogene"; exon_id "ENSE00001744948"; exon_version "1"; tag "basic"; transcript_support_level "NA";

And then at the end I get this:

Total 0 transcripts loaded
Cannot get coverage signal from S10nM_r1.sorted.bam ! Skip

To start with, I downloaded Homo_sapiens.GRCh38.97.gff3 from Ensembl and used gff2bed (2.4.36) to convert to a bed file which I used for reference. The first thing to do is get a 12-column bed file but I can't seem to find how to do that. But I also think I have multiple issues here and I just can't diagnose it. Any solutions?

ADD COMMENTlink modified 3 months ago • written 3 months ago by crcarroll10

well your input file doesn't fit the required BED format, check it out here. Also, if I remember correctly, the program requires tab-separated input.

ADD REPLYlink written 3 months ago by Martombo2.6k

Why should the bed file not be formatted correctly? I simply downloaded the gfff from Ensemble and used BEDOPS gff2bed < my_file.gff > my_file.bed.

ADD REPLYlink written 3 months ago by crcarroll10
1

And yet, here we are ;)

Just a general tip for bioinformatics: always check the results you get, you cannot simply trust any program blindly. Even popular formats like BED, GTF can have slightly different properties / requirements depending on the tool you are using. The error is telling you that the input should have 12 columns. Does your file look like it has 12 columns?

ADD REPLYlink modified 3 months ago • written 3 months ago by Martombo2.6k

Right, I knew that. Part of my question included how to convert my gff file from Ensembl to a 12-column bed file. I thought you were then referring to some other format issue.

ADD REPLYlink written 3 months ago by crcarroll10

Alright, are you using bash? If so you can run an awk one liner to get the 12 column you need. Something along these lines should work:

awk 'BEGIN{OFS="\t"}{$7=$2; $8=$3; $9="0,0,0"; $10=1; $11=$3-$2","; $12="0,"; print}'
ADD REPLYlink written 3 months ago by Martombo2.6k

I figured something out. I used gff3ToGenePred followed by genePredToBed tools from UCSC utilities . This outputs a 12-column .bed.

ADD REPLYlink written 3 months ago by crcarroll10
0
gravatar for crcarroll
3 months ago by
crcarroll10
crcarroll10 wrote:

I used gff3ToGenePred followed by genePredToBed tools from UCSC utilities. This outputs a 12-column .bed. geneBody_coverage.py is running so far without error.

ADD COMMENTlink written 3 months ago by crcarroll10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 961 users visited in the last hour