Fastest way to convert BED to GTF/GFF with gene_ids?
1
1
Entering edit mode
4 months ago
alejandrogzi ▴ 120

Given a BED file (BED12), what is the fastest tool (or available tools) to convert it to GTF or GFF3?

gff convert bed gtf • 656 views
ADD COMMENT
3
Entering edit mode
4 months ago
alejandrogzi ▴ 120

This is probably a duplicated question from:

How To Convert Bed Format To Gtf?

How to convert original BED file to a GTF ?

Converting different annotation file formats (GTF/GFF/BED) to each other

How to change scaffold.fasta file or scaffold.bed file to GTF file?

Convert bed12 to GFF

convert bed12 to sorted gtf

Converting from BED to SAF/GFF

However, 1) all are outdated, 2) none produce a complete GTF/GFF (gene_ids attribute), 3) no benchmark, and 4) none provide an ordered list of options. Here, I provide an ordered list of options:

bed2gtf

A high-performance BED-to-GTF converter written in Rust from https://github.com/alejandrogzi/bed2gtf.

Usage: bed2gtf[EXE] --bed/-b <BED> --isoforms/-i <ISOFORMS> --output/-o <OUTPUT> --threads/-t <THREADS>

where:
--bed <BED>: a .bed file
--isoforms <ISOFORMS>: a tab-delimited file
--output <OUTPUT>: path to output file (*.gtf)

The isoforms file specification:

a tab-delimited .txt/.tsv/.csv/... file with genes/isoforms (all the transcripts in .bed file should appear in the isoforms file):

> cat isoforms.txt

ENSG00000198888 ENST00000361390
ENSG00000198763 ENST00000361453
ENSG00000198804 ENST00000361624
ENSG00000188868 ENST00000595977

Converts

  • Homo sapiens GRCh38 GENCODE 44 (252,835 transcripts) in 3.25 seconds.
  • Mus musculus GRCm39 GENCODE 44 (149,547 transcritps) in 1.99 seconds.
  • Canis lupus familiaris ROS_Cfam_1.0 Ensembl 110 (55,335 transcripts) in 1.20 seconds.
  • Gallus galus bGalGal1 Ensembl 110 (72,689 transcripts) in 1.36 seconds.

bed2gff

A Rust BED-to-GFF3 translator that runs in parallel from https://github.com/alejandrogzi/bed2gff.

Usage: bed2gff[EXE] --bed/-b <BED> --isoforms/-i <ISOFORMS> --output/-o <OUTPUT> --threads/-t <THREADS>

where:
--bed <BED>: a .bed file
--isoforms <ISOFORMS>: a tab-delimited file
--output <OUTPUT>: path to output file (*.gff)

The isoforms file specification:

a tab-delimited .txt/.tsv/.csv/... file with genes/isoforms (all the transcripts in .bed file should appear in the isoforms file):

> cat isoforms.txt

ENSG00000198888 ENST00000361390
ENSG00000198763 ENST00000361453
ENSG00000198804 ENST00000361624
ENSG00000188868 ENST00000595977

Convert

  • Homo sapiens GRCh38 GENCODE 44 (252,835 transcripts) in 4.16 seconds.
  • Mus musculus GRCm39 GENCODE 44 (149,547 transcritps) in 2.15 seconds.
  • Canis lupus familiaris ROS_Cfam_1.0 Ensembl 110 (55,335 transcripts) in 1.30 seconds.
  • Gallus gallus bGalGal1 Ensembl 110 (72,689 transcripts) in 1.51 seconds.

bedToGenePred + genePredToGtf + refTable

UCSC offers a fast way to convert BED into GTF files through KentUtils or specific binaries using:

bedToGenePred in.bed /dev/stdout | genePredToGtf file /dev/stdin out.gtf

You can install these tools with bioconda, or download them here. The gene_id is only achieved when using refTables (a format specified in UCSC's web browser), you can see a more elaborate answer here Obtaining Ucsc Tables Via Ftp And Converting Them To Proper Gff3 Via Genepredtogtf?.

Other options

Other scripts/tools That DO NOT produce a complete GTF file (lacking gene_id attributes) are:

  • gtf2bed $ gtf2bed < foo.gtf | sort-bed - > foo.bed $ awk '{print $1"\t"$7"\t"$8"\t"($2+1)"\t"$3"\t"$5"\t"$6"\t"$9"\t"(substr($0, index($0,$10)))}' foo.bed > foo_from_gtf2bed.gtf

-kscript from https://github.com/holgerbrandl/kscript:

kscript https://git.io/vbJ4B my.bed > my.gtf
  • pfurio/bed2gtf

from https://github.com/pfurio/bed2gtf:

python bed2gtf [options] <mandatory>
  • AGAT

AGAT

Considering only the options that produce gene_ids attributes, bed2gtf and bed2gff are faster by ~3-4 seconds than UCSC's C binaries. More detailed instructions of this tools are explained in the sources linked.

ADD COMMENT
1
Entering edit mode

thanks for making these great tools

ADD REPLY

Login before adding your answer.

Traffic: 1774 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6