Question: Input data format for MethylSeekR
gravatar for tiplud
3.5 years ago by
tiplud0 wrote:

Hi Everyone,

    I have downloaded a BiSulfite-Seq dataset from Encode, which is only in wig format, with the first few lines as following  : 

track type=wiggle_0 name="UCSF-UBC.Penis_Foreskin_Keratinocyte_Primary_Cells.Bisulfite-Seq.skin03:methRatio" visibility=full color=20,150,20 altColor=150,20,20 windowingFunction=mean

variableStep chrom=chr1

10469   0.347826086956522

10470   0.347826086956522

10471   0.608695652173913

10472   0.608695652173913

10484   0.88

In the description page in GEO, it mentions that the 2nd column are Methylation proportions.

I would like to read in this data into MethylSeekR, as I wish to identify LMR, FMR and UMRs. So far, after searching around in the internet and the package manual I am unable to find a way to do this. I tried using the readMethylome function, but it mentions that ( I am copying and pasting ):

If format is set to "text" (default), the argument FileName should refer to a tab-delimited text file in the format: chromosome position T M, where each line stands for a CpG, the position refers to the C of the CpG (on the plus strand), T is the total number of reads (total counts) covering the CpG and M the total number of reads without C to T conversion at the C of the CpG (methylation counts). If format="GRanges", the file is assumed to be a GRanges object, containing T and M as first and second data-value entries, saved in rds format. 

Is there a way to get T and M from the wig file ? Or any other way to read in the data to use the package ?


Thank you,


bisulfite-seq methylseekr • 1.2k views
ADD COMMENTlink modified 3.5 years ago by Devon Ryan88k • written 3.5 years ago by tiplud0

please provide more information, such as a downloadable example data file and your R code. I think we can built such pipeline. 

ADD REPLYlink written 3.5 years ago by Shicheng Guo7.4k
gravatar for Devon Ryan
3.5 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

A wig file won't hold the information that MethylSeekR needs. Perhaps you could get T and C counts by just multiplying by a constant for everything and rounding, but this will be approximate and I don't recall enough of the details in MethylSeekR to know if this will cause problems. I fear that you'll need to remap the fastq files if you can't get a more useful format (wiggle files are nice for visualization but aren't very useful for statistics).

ADD COMMENTlink written 3.5 years ago by Devon Ryan88k

Thank you ! Since I cannot find raw data, I will try multiplying by a constant and rounding. However, as you said, I fear it will be very approximate, and also I have no information about total read counts at each position, so I am not sure how accurate the analysis will be.

ADD REPLYlink written 3.5 years ago by tiplud0

Good luck! Please do post back to report how well/poorly that works. I doubt you'll be the last person to run into this issue.

ADD REPLYlink written 3.5 years ago by Devon Ryan88k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2201 users visited in the last hour