Question: Wig, Bigwig And Bedgraph, Any In Depth Description ?
1
gravatar for Radhouane Aniba
7.2 years ago by
Radhouane Aniba750 wrote:

Hello everyone,

All of us, at least people working on NGS data analysis, are familiar with these three data types, and we are refering to UCSC documentation to understand their content and meaning. That's nice, but still there is some black boxes not necessary clear especially these questions :

  • how comes that these files generated for the same experiment by different programs, create different values and different distribution
  • a question that is more related to these programs now, how are these files created from reads alignments ? What algorithm is behind these files ?
  • documentation on the format are clear enough, but significance and meaning of the scores inside each file are not explained, what are these values referring to ?
  • look at this sentence from UCSC " The BedGraph format allows display of continuous-valued data in track format. This display type is useful for probability scores and transcriptome data" : probability score ? Of what ? How ?

I hope i am not merging a lot of questions in a single post, they are all related and i think it is worth mentionning them in block so that we can discuss them in the same time.

Thanks for all.

Rad

wiggle • 7.8k views
ADD COMMENTlink modified 5.4 years ago by Biostar ♦♦ 20 • written 7.2 years ago by Radhouane Aniba750
6
gravatar for Madelaine Gogol
7.2 years ago by
Madelaine Gogol5.0k
Kansas City
Madelaine Gogol5.0k wrote:

http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#CustomTracks

http://genome.ucsc.edu/goldenPath/help/bedgraph.html

http://genome.ucsc.edu/goldenPath/help/wiggle.html

http://genome.ucsc.edu/goldenPath/help/bigWig.html

"how comes that these files generated for the same experiment by different programs, create different values and different distribution"

Welcome to bioinformatics.

"how are these files created from reads alignments ? What algorithm is behind these files ?"

samtools pileup, genomeCoverageBed from bedtools are two ways...

"significance and meaning of the scores inside each file are not explained, what are these values referring to ?"

That depends on who made them and what data they are based on and what they did to it, but most likely read count or log2(IP/input) or something like that.

"probability score ? Of what ? How ?"

probability that each nucleotide belongs to a conserved element? The conservation tracks created by UCSC are probability based.

ADD COMMENTlink written 7.2 years ago by Madelaine Gogol5.0k
2
gravatar for Sean Davis
7.2 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

These are formats that are meant to be general. For example, the BedGraph format is meant to store continuous-valued data. Documentation of what those continuous values represent is not a required part of the format. Sometimes, UCSC tracks contain descriptions within the files themselves, but often the interpretation of the values is left to external documentation.

If you asking about specific tracks at UCSC, you can either write to UCSC mailing list for details about tracks of interest or read the track description available by clicking on the bar to the left of the track on the browser.

ADD COMMENTlink written 7.2 years ago by Sean Davis25k
4

The values contained in the files will vary. For example, sometimes a wig file will contain p-values, sometimes log10(p-values), sometimes base coverage, sometimes a value between 0 and 1. All of these and any other values are allowed and possible. The algorithm for making the files is also not defined. Sometimes it will be a statistical model, sometimes read counts, sometimes hidden markov model, sometimes something else. So, you MUST specify the specific track or file to get your answer, as there is NO answer to your more general question. Hope that helps.

ADD REPLYlink written 7.2 years ago by Sean Davis25k

Thanks Sean, actually my question is not about the format itself, but what these files contains and how they are created ? What the values in wig files for example represent for the reads aligned to a genome ? We talk about density, but density of what and how these densities are calculated and why they are not normalized such as probabilities, so that one can know for example if a value of 3e4 is meaningful or not and compared to what background

ADD REPLYlink written 7.2 years ago by Radhouane Aniba750

In which tracks are you interested? Perhaps you can edit your original question to include a link to your UCSC session. You could also consider writing to UCSC (or the original source of the files) for details. There is not a standard way of creating wig, bigwig, or bedgraph from NGS data.

ADD REPLYlink written 7.2 years ago by Sean Davis25k

Thanks Sean, there is no specefic track in mind, my question is co serning these files in general, to make it simple let's say it that way : what are these files containing as information ? What are the values that they contain refer to ? How are they created ?

ADD REPLYlink written 7.2 years ago by Radhouane Aniba750

It helps in a sense, to know that we don't have a standardization for these data which could lead to problems comparing different experiments. Thx for your answers.

ADD REPLYlink written 7.2 years ago by Radhouane Aniba750

The tradeoff for lack of a standard is total flexibility. I agree that this can lead to confusion.

ADD REPLYlink written 7.2 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1225 users visited in the last hour