Hello, I am trying to convert a genes.gtf files into a bed one. For this I am using the "bedops" script gff2bed. My command line is ./gff2bed genes.gtf | ./sort-bed -> genes.bed Its does start but right after I get a "stopped" message. The same happens if I execute only ./gff2bed genes.gtf > genes.bed I have also converted a vcf file into a bed one with the command-line ./vcf2bed.py < input.vcf | ./sort-bed -> vcf.bed This works fine, but when I run /vcf2bed.py < input.vcf -> vcf.bed I get the "stopped" message again. Does anyone know what's wrong with this? Thanks, G.
gff2bed script converts GFF3 files, not GTF files. I haven't verified that
gff2bed can work on generic GTF files. While the two formats are related, they do differ and you should use (for example) Bill Noble's
gtf2bed conversion script to convert GTF files.
Are you running
vcf2bed.py < input.vcf -> vcf.bed? That hyphen would be a typo, probably — I'm not sure what that would do. You would want to run:
vcf2bed.py < input.vcf | sort-bed - > vcf.bed to ensure the BED output is sorted.
If you really want to skip sorting (let's say that your VCF elements are guaranteed to be lexicographically-sorted), then use
vcf2bed.py < input.vcf > vcf.bed.
Note that it is not guaranteed that BEDOPS tools can provide a correct answer on unsorted BED inputs, so it might be better to use the BEDOPS
sort-bed application, just to be safe and prevent GIGO errors. Or add the
--ec option when using
bedmap or other BEDOPS applications, which enables error-checking (though it is usually much faster to just use
sort-bed to pre-sort the input — you only have to sort once, as BEDOPS apps read in and write out sorted data).
If you still get errors, feel free to post a transcript of what you're doing on the BEDOPS forum, along with links to your GTF and/or VCF inputs.
Also consider running your data through validation scripts — we've uncovered problems in the past with labs releasing data that do not follow specification, which requires extra pre-processing steps to clean up input before it is consumed by our scripts and applications. We've also uncovered problems with our assumptions about specifications and how that translates to a script or application. Biology is messy, but bioinformatics is messier.