Question: Bedops cannot read .BED file_ please help!
1
gravatar for xiaoyonf
5.1 years ago by
xiaoyonf10
Baylor College of Medicine, Houston, Texas, USA
xiaoyonf10 wrote:

Hi, I want to use bedops to analyze some of the .bed files from ChIP-seq, but one of the files I can't go through the analysis, even can't sort-bed.  It keeps saying the BED row length exceeds capacity at line 1.  It doesn't help even after I deleted the first row of my .bed file.

"unknownb8f6b1106ced:chip-seq fuxiaoyong$ sort-bed era_inpegf_mcf7_n3_hg18_f20_nr.bed
BED row length exceeds capacity at line 1 in era_inpegf_mcf7_n3_hg18_f20_nr.bed.
Check that you have unix newlines (cat -A) or increase TOKENS_MAX_LENGTH in BEDOPS.Constants.hpp and recompile BEDOPS."

I am very new for using tools to analyze NGS data, even for the Mac OS X system.  So, please help me and your detailed explanation will be greatly appreciated!!

 

 

chip-seq • 2.0k views
ADD COMMENTlink modified 5.1 years ago by Alex Reynolds28k • written 5.1 years ago by xiaoyonf10
1

Hi, could you maybe give us the output of the following commands:

1. wc -l era_inpegf_mcf7_n3_hg18_f20_nr.bed

2. head -n 1 era_inpegf_mcf7_n3_hg18_f20_nr.bed | cat -A

Thank you!

ADD REPLYlink written 5.1 years ago by RamRS24k

Hi Ram RS, thanks for your reply!  Please see the following results when I put into the commands you asked.  I am looking forward to solving this mystery!

unknownb8f6b1106ced:chip-seq fuxiaoyong$ wc -l era_inpegf_mcf7_n3_hg18_f20_nr.bed
       0 era_inpegf_mcf7_n3_hg18_f20_nr.bed
unknownb8f6b1106ced:chip-seq fuxiaoyong$ head -n 1 era_inpegf_mcf7_n3_hg18_f20_nr.bed | cat -A
cat: illegal option -- A
usage: cat [-benstuv] [file ...]

ADD REPLYlink written 5.1 years ago by xiaoyonf10
1

It kinda looks like it's either an empty file or the new line characters are acting weird. Please check out these two commands for their output:

ls -lh era_inpegf_mcf7_n3_hg18_f20_nr.bed
head -n 2 era_inpegf_mcf7_n3_hg18_f20_nr.bed | cat -te
ADD REPLYlink modified 7 weeks ago • written 5.1 years ago by RamRS24k

Now I kinda know the reason, but don't know how to fix it. I opened the .bed file before in Excel and deleted several blank rows I found and re-saved it. I noticed that after doing this, the file kind became the "simple text format" from the "Unix Executable File". When I do sort-bed or other bedops commands on this file, the problem as I posted appeared.

So, I did try use the command: cat xxx.txt > xxx.bed and it seems still not work. The question is:

  1. How to edit the .bed file if something wrong (e.g. several blank rows) in this file, without to force opening it in Excel?
  2. How to convert back the file edited by Excel to .bed unix excutable file?

Here, I also post the output when I did the sort-bed on my very beginning .bed file without any editing in Excel. The message of potential blanks line made me opened it before in excel and did find several blank rows.

unknownb8f6b1106ced:chip-seq fuxiaoyong$ sort-bed era_inpegf_mcf7_n3_hg18_f20_nr.bed
No tabs/spaces found at line 4026 in era_inpegf_mcf7_n3_hg18_f20_nr.bed.

Thanks for all your help!

ADD REPLYlink modified 7 weeks ago by RamRS24k • written 5.1 years ago by xiaoyonf10
1

Hi, the usual advice first: While Excel is really tempting, it is a bad tool for bioinformatics. Most files in Bioinformatics are plain text, meaning any plain text editor can read them. If they're a manage-able size, I'd suggest TextWrangler or BBEdit for Mac, gedit or kedit for Linux and Notepad++ for PC. If you're comfortable with command line, emacs, vim or nano can be used.

A BED file is a tab-delimited file. This means fields in each line of a BED file are separated by tab characters. Excel's usual behavior is to import a tab delimited file such that each field is in its own cell. While this might help peeking into the content, modifying via Excel is best avoided.

In summary, use TextWrangler. It's lightweight, does not botch stuff up and is way more friendly with plain text files than Excel. If you wanna do statistical analysis from these files, I'd suggest using Python (IPython notebook) or R.

Pro-tip: To remove all blank lines from a file (and write to a new file), run this:

sed -re '/^$/d' input.bed >output.bed

Also, let's say you wish to sort a bed file so the first column is in ascending order. You can use UNIX's builtins to do this. Multiple options here: How To Sort Bed Format File

Hope this helps!

ADD REPLYlink modified 7 weeks ago • written 5.1 years ago by RamRS24k

Thank you so much, Ram RS!! I solved my problem! I did exactly what you said: Download TextWrangler > Edit .bed file in TextWrangler (i.e. delete the blank rows) > sort-bed the .bed file (successfully!) > Run bedops commands and playing now with my data (answer my biological questions!)

This is a great bioinformatics forum/community! I love it!! And will come here often...

Thanks and have a great weekend!!!

ADD REPLYlink modified 7 weeks ago by RamRS24k • written 5.1 years ago by xiaoyonf10
You're welcome! Have fun with the data!
ADD REPLYlink written 5.1 years ago by RamRS24k

Hello @xiaoyonf, I'd really appreciate if you could maybe mark my answer below as an answer to this question. Thank you!

ADD REPLYlink written 5.1 years ago by RamRS24k
1

Perhaps this is a line ending problem. Post what Ram RS asked for and I expect you'll see that the first line is actually the whole file (this is an easy fix).

ADD REPLYlink written 5.1 years ago by Devon Ryan92k
3
gravatar for RamRS
5.1 years ago by
RamRS24k
Houston, TX
RamRS24k wrote:

Hello,

As discussed, the problem was owing to Excel messing up your BED file. To delete blank rows from your BED file, use the following command:

sed '/^$/d' <era_inpegf_mcf7_n3_hg18_f20_nr.bed >era_inpegf_mcf7_n3_hg18_f20_nr_NoBlanks.bed

Or, you could also manually remove blank lines (a tedious process) by opening the file in TextWrangler.

ADD COMMENTlink modified 11 months ago • written 5.1 years ago by RamRS24k
3
gravatar for Alex Reynolds
5.1 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

The issue is probably not blank lines, which the current version of sort-bed removes, but a lack of correct newline characters, since Excel (even the Mac version) adds Windows-specific line delimiters to the text files it exports.

If the file doesn't end with a UNIX newline character, then you will get an error message. The usual way to fix this is to use dos2unix on the output from Excel, or to avoid handling data in Excel where possible.

If clearing out empty lines fixes sorting for you, then look at upgrading your version of BEDOPS and sort-bed to the current release (presently v2.4.2), as you might be running an older binary.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Alex Reynolds28k

I agree, but if you're using Excel to work with a BED file, with all due respect, you're not doing it right.

But yes, the point about Mac's CR vs Unix's LF is totally an important aspect that I forgot at the moment.

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by RamRS24k

Yep, I'm not trying to badmouth Excel. It has its place, but it is really not appropriate for any work that involves UNIX or the command line.

ADD REPLYlink written 5.1 years ago by Alex Reynolds28k

I agree. That's exactly my point. Excel is a powerful tool once you realize where not to use it.

ADD REPLYlink written 5.1 years ago by RamRS24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1165 users visited in the last hour