What Does Distinguish Header From Content In Bed Files?
4
4
Entering edit mode
7.9 years ago

Hi.

I am considering saving some genomic data to a bed file. However, I am a bit concern about the format and what, exactly, distinguishes the track lines from the content.

Take these examples:

the header can have a variable number of lines, there isn't, as far a I know, a limited set of starting keywork (track and browser are two, but I have seen others and, furthermore, they can span several lines). I don't want to search for "chr*" as sometimes people save chromosomes only with their number and, anyway, there could be contigs or scafolds or whatever. The only differences I can see is that content has tabs, header has spaces, but If I use this to decide when the content starts, an accidental tab in the header would mess up. And, apparently, the content COULD be space delimited

thanks

bed parsing format • 6.7k views
2
Entering edit mode

Hello Stefano, Really difficult to distinguish. I would think of searching for 2nd AND 3rd field (Required fields) as coordinates (independent of tab or space delimited files).

AndreiR

1
Entering edit mode

Very good question. Usually I consider that all the lines not starting with "browser" or "track" are the content lines.

0
Entering edit mode

except that, it looks, they can span several lines. In the example, one line starts with "itemRgb="On""

1
Entering edit mode

no, in that example, itemRgb is on the same line as "track name". See the raw file of the example here: http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt

1
Entering edit mode

thanks, if that is the case, then it's not too hard. I haven't found a documents that "officially" states something like that, though...

0
Entering edit mode

yes, that might work actually... column 2 and 3 are always number in the content, but never in browser or track...

1
Entering edit mode
7.9 years ago

According to me, bed format generally represents a tab delimited file, starting mostly with chromosome, start and end plus the fourth column could be strand, peak height, size, width, confidence etc, if we talk about the peak files. The one in your examples, having the track lines are generally for the visualization in the browser and can be further classified as bigBed, wig or bigWig. I wouldn't confuse between these two. Most of the bed files (like publicly available in GEO), won't have the track lines whereas wig or bigBed will have that.

If you want for the visualization, make a custom header, that stays constant for you as a single or double line, containing name, description, color and type etc or make a track file, which gathers and controls all of your tracks (normal bed files), thus you don't have to annotate each bed file separately.

Cheers

0
Entering edit mode
7.9 years ago

I don't want to search for "chrN" as sometimes people save chromosomes only with their number

If they are doing that, they aren't following convention. For whatever it's worth, people and research consortiums choosing not to follow spec is not a problem only restricted to the BED format.

0
Entering edit mode

I know, but, in general, the reference genome might not have only chrN sequences. If I use a reference genome with contigs or scaffolds, for example or the 1000 genome with decoy, they, I think, won't look like chrN

0
Entering edit mode
7.6 years ago
Whoknows ▴ 880

Hi friends

I've had this problem which solved by BEDOPS , sam2bed tool is very good at this. After converting to BED file you can determine BED file header by sam2bed information page :

enjoy.,,

0
Entering edit mode
7.6 years ago

you could always look to chromosome positions rather than chromosome labels. a pattern such as ^\S+\t\d+\t\d+ will help you to distinguish headers from data lines.