Input data format for MethylSeekR
1
0
Entering edit mode
8.6 years ago
tiplud • 0

Hi Everyone,

I have downloaded a BiSulfite-Seq dataset from Encode, which is only in wig format, with the first few lines as following:

track type=wiggle_0 name="UCSF-UBC.Penis_Foreskin_Keratinocyte_Primary_Cells.Bisulfite-Seq.skin03:methRatio" visibility=full color=20,150,20 altColor=150,20,20 windowingFunction=mean
variableStep chrom=chr1
10469   0.347826086956522
10470   0.347826086956522
10471   0.608695652173913
10472   0.608695652173913
10484   0.88

In the description page in GEO, it mentions that the 2nd column are Methylation proportions.

I would like to read in this data into MethylSeekR, as I wish to identify LMR, FMR and UMRs. So far, after searching around in the internet and the package manual I am unable to find a way to do this. I tried using the readMethylome function, but it mentions that (I am copying and pasting):

If format is set to "text" (default), the argument FileName should refer to a tab-delimited text file in the format: chromosome position T M, where each line stands for a CpG, the position refers to the C of the CpG (on the plus strand), T is the total number of reads (total counts) covering the CpG and M the total number of reads without C to T conversion at the C of the CpG (methylation counts). If format="GRanges", the file is assumed to be a GRanges object, containing T and M as first and second data-value entries, saved in rds format.

Is there a way to get T and M from the wig file ? Or any other way to read in the data to use the package?

Thank you,
Tiplu

MethylSeekR BiSulfite-seq • 2.7k views
ADD COMMENT
1
Entering edit mode

Please provide more information, such as a downloadable example data file and your R code. I think we can built such pipeline.

ADD REPLY
2
Entering edit mode
8.6 years ago

A wig file won't hold the information that MethylSeekR needs. Perhaps you could get T and C counts by just multiplying by a constant for everything and rounding, but this will be approximate and I don't recall enough of the details in MethylSeekR to know if this will cause problems. I fear that you'll need to remap the fastq files if you can't get a more useful format (wiggle files are nice for visualization but aren't very useful for statistics).

ADD COMMENT
0
Entering edit mode

Thank you ! Since I cannot find raw data, I will try multiplying by a constant and rounding. However, as you said, I fear it will be very approximate, and also I have no information about total read counts at each position, so I am not sure how accurate the analysis will be.

ADD REPLY
0
Entering edit mode

Good luck! Please do post back to report how well/poorly that works. I doubt you'll be the last person to run into this issue.

ADD REPLY

Login before adding your answer.

Traffic: 1375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6