Question: What Does Distinguish Header From Content In Bed Files?
4
gravatar for Stefano Berri
4.4 years ago by
Stefano Berri4.0k
Cambridge, UK
Stefano Berri4.0k wrote:

Hi.

I am considering saving some genomic data to a bed file. However, I am a bit concern about the format and what, exactly, distinguishes the track lines from the content.

Take these examples:

the header can have a variable number of lines, there isn't, as far a I know, a limited set of starting keywork (track and browser are two, but I have seen others and, furthermore, they can span several lines). I don't want to search for "chr*" as sometimes people save chromosomes only with their number and, anyway, there could be contigs or scafolds or whatever. The only differences I can see is that content has tabs, header has spaces, but If I use this to decide when the content starts, an accidental tab in the header would mess up. And, apparently, the content COULD be space delimited

thanks

bed format parsing • 3.7k views
ADD COMMENTlink modified 3.5 years ago by Biostar ♦♦ 20 • written 4.4 years ago by Stefano Berri4.0k
2

Hello Stefano, Really difficult to distinguish. I would think of searching for 2nd AND 3rd field (Required fields) as coordinates (independent of tab or space delimited files).

AndreiR

ADD REPLYlink written 4.4 years ago by AndreiR240
1

Very good question. Usually I consider that all the lines not starting with "browser" or "track" are the content lines.

ADD REPLYlink written 4.4 years ago by Giovanni M Dall'Olio26k

except that, it looks, they can span several lines. In the example, one line starts with "itemRgb="On""

ADD REPLYlink written 4.4 years ago by Stefano Berri4.0k
1

no, in that example, itemRgb is on the same line as "track name". See the raw file of the example here: http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Giovanni M Dall'Olio26k
1

thanks, if that is the case, then it's not too hard. I haven't found a documents that "officially" states something like that, though...

ADD REPLYlink written 4.4 years ago by Stefano Berri4.0k

yes, that might work actually... column 2 and 3 are always number in the content, but never in browser or track...

ADD REPLYlink written 4.4 years ago by Stefano Berri4.0k
1
gravatar for Sukhdeep Singh
4.4 years ago by
Sukhdeep Singh9.3k
Netherlands
Sukhdeep Singh9.3k wrote:

According to me, bed format generally represents a tab delimited file, starting mostly with chromosome, start and end plus the fourth column could be strand, peak height, size, width, confidence etc, if we talk about the peak files. The one in your examples, having the track lines are generally for the visualization in the browser and can be further classified as bigBed, wig or bigWig. I wouldn't confuse between these two. Most of the bed files (like publicly available in GEO), won't have the track lines whereas wig or bigBed will have that.

If you want for the visualization, make a custom header, that stays constant for you as a single or double line, containing name, description, color and type etc or make a track file, which gathers and controls all of your tracks (normal bed files), thus you don't have to annotate each bed file separately.

More info : http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks

http://genome.ucsc.edu/goldenPath/help/bigWig.html (Point #7)

Cheers

ADD COMMENTlink written 4.4 years ago by Sukhdeep Singh9.3k
0
gravatar for Alex Reynolds
4.4 years ago by
Alex Reynolds24k
Seattle, WA USA
Alex Reynolds24k wrote:

I don't want to search for "chrN" as sometimes people save chromosomes only with their number

If they are doing that, they aren't following convention. For whatever it's worth, people and research consortiums choosing not to follow spec is not a problem only restricted to the BED format.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Alex Reynolds24k

I know, but, in general, the reference genome might not have only chrN sequences. If I use a reference genome with contigs or scaffolds, for example or the 1000 genome with decoy, they, I think, won't look like chrN

ADD REPLYlink written 4.4 years ago by Stefano Berri4.0k
0
gravatar for Whoknows
4.1 years ago by
Whoknows640
Tehran,Iran
Whoknows640 wrote:

Hi friends

I've had this problem which solved by BEDOPS , sam2bed tool is very good at this. After converting to BED file you can determine BED file header by sam2bed information page :

http://bedops.readthedocs.org/en/latest/content/reference/file-management/conversion/sam2bed.html

enjoy.,,

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Whoknows640
0
gravatar for Jorge Amigo
4.1 years ago by
Jorge Amigo10k
Santiago de Compostela, Spain
Jorge Amigo10k wrote:

you could always look to chromosome positions rather than chromosome labels. a pattern such as ^\S+\t\d+\t\d+ will help you to distinguish headers from data lines.

ADD COMMENTlink written 4.1 years ago by Jorge Amigo10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1566 users visited in the last hour