Question

Gff2Bed.Py Stops

0

Entering edit mode

11.3 years ago

GPR ▴ 390

Hello, I am trying to convert a genes.gtf files into a bed one. For this I am using the "bedops" script gff2bed. My command line is ./gff2bed genes.gtf | ./sort-bed -> genes.bed Its does start but right after I get a "stopped" message. The same happens if I execute only ./gff2bed genes.gtf > genes.bed I have also converted a vcf file into a bed one with the command-line ./vcf2bed.py < input.vcf | ./sort-bed -> vcf.bed This works fine, but when I run /vcf2bed.py < input.vcf -> vcf.bed I get the "stopped" message again. Does anyone know what's wrong with this? Thanks, G.

• 3.7k views

ADD COMMENT • link 11.3 years ago by GPR ▴ 390

Ram · Answer 1 · 2013-01-15

The BEDOPS gff2bed script converts GFF3 files, not GTF files. I haven't verified that gff2bed can work on generic GTF files. While the two formats are related, they do differ and you should use (for example) Bill Noble's gtf2bed conversion script to convert GTF files.

Are you running vcf2bed.py < input.vcf -> vcf.bed? That hyphen would be a typo, probably — I'm not sure what that would do. You would want to run: vcf2bed.py < input.vcf | sort-bed - > vcf.bed to ensure the BED output is sorted.

If you really want to skip sorting (let's say that your VCF elements are guaranteed to be lexicographically-sorted), then use vcf2bed.py < input.vcf > vcf.bed.

Note that it is not guaranteed that BEDOPS tools can provide a correct answer on unsorted BED inputs, so it might be better to use the BEDOPS sort-bed application, just to be safe and prevent GIGO errors. Or add the --ec option when using bedmap or other BEDOPS applications, which enables error-checking (though it is usually much faster to just use sort-bed to pre-sort the input — you only have to sort once, as BEDOPS apps read in and write out sorted data).

If you still get errors, feel free to post a transcript of what you're doing on the BEDOPS forum, along with links to your GTF and/or VCF inputs.

Also consider running your data through validation scripts — we've uncovered problems in the past with labs releasing data that do not follow specification, which requires extra pre-processing steps to clean up input before it is consumed by our scripts and applications. We've also uncovered problems with our assumptions about specifications and how that translates to a script or application. Biology is messy, but bioinformatics is messier.