Gene expression estimates in the Epigenome Roadmap
2
0
Entering edit mode
7.0 years ago

Hi, I am using Epigenome Roadmap data, and I can only find *.bed or *.wig files for mRNAseq and chipseq data (e.g. breast stem cell) for display on the genome browser. How I can get processed RPKM/FPKM expression level estimates and chipseq peak calls?

Of course, one could analyze the *.bed or*.wig files and estimate expression levels or peaks themselves, but I was wondering if processed data is already available.

Thanks, Subho

RNA-Seq ChIP-Seq • 2.6k views
ADD COMMENT
1
Entering edit mode

WIG is probably expression level, and BED is probably peaks. Be sure to investigate what the EpigenomeRoadmap people have to say about those files.

edit: this BED looks like raw read names, and alignments. You could construct RPKM by carefully counting reads found at exon sites, or chipseq peaks at promoter regions. The WIG looks like read depth, so you can ignore that.

ADD REPLY
0
Entering edit mode

I am not sure if that is the case. By looking at the first few lines of the files, I think the bed files provide info about individual mapped reads, while wig file provides depth of coverage at a per-bp basis.

$ less ~/Downloads/GSM543029_UCSF-UBC.Breast_Luminal_Epithelial_Cells.mRNA-Seq.RM035.bed | head -5
chr1    16159    16205    SOLEXA9_42:2:1:1529:310.L    -
chr1    118928    118978    SOLEXA9_42:2:3:447:886.R    -
chr1    122839    122889    SOLEXA9_42:2:2:1292:1782.L    +
chr1    134971    135018    SOLEXA9_42:2:2:474:588.R    -
chr1    135005    135055    SOLEXA9_42:2:1:303:1134.R    -

 

$ less ~/Downloads/GSM543029_UCSF-UBC.Breast_Luminal_Epithelial_Cells.mRNA-Seq.RM035.wig | head -10

track type=wiggle_0 visibility=full color=20,150,20
fixedStep chrom=chr1 start=11001 step=20 span=20
0
0
1
1
1
0
0
0

 

 

ADD REPLY
0
Entering edit mode
7.0 years ago

Be sure to investigate what the EpigenomeRoadmap people have to say about those files.

It looks like your BED file reports reads from a Solexa machine. Reads reported as BED don't carry sequence information, and you'll have to parse their reference sequences, or annotate to feature regions. 

Check bedtools for intersectBed -c and it will count the hits along your genes BED.

The WIG just says there's some read at location 11041 through 11100. 

You'll have to trust their mapping strategy, so find their reference sequence, or better the raw data of course. 

 

ADD COMMENT
0
Entering edit mode
7.0 years ago

I have now heard back from members of the Epigenetic Roadmap team about data availability.

  • Yes, right now you should only be able to access the .bed and .wig files which were submitted to the NCBI.
  • The peak calls and RPKM for the roadmap data is part of the submitted (currently under revision) analysis part of the consortium manuscript. 
  • This data will be available after the manuscript has been published.  There will be web resource links in the manuscript that links to the dataset.

And that addresses my question. Karl, yes, I could analyze the data myself the way you suggested, but given the response from the Epigenetic Roadmap team, I would just wait to work with their processed data.

ADD COMMENT

Login before adding your answer.

Traffic: 848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6