Question: Download all chip-seq data of a cell type in ENCODE
1
gravatar for izzy.yichao.cai
18 months ago by
izzy.yichao.cai120 wrote:

Hi all!

Does anyone know how to download all chip-seq data(in BED format) of a certain cell type(let's say GM12878) in ENCODE.

I tried search via matrix but can't select specific cell type for download. It can only select a series of cell type(Like I can select all immortalized cell lines, which include GM12878). I only want to download the data related to GM12878.

Any hints?

Thanks a lot for your time and help! : )

chip-seq • 1.1k views
ADD COMMENTlink modified 18 months ago by EagleEye4.9k • written 18 months ago by izzy.yichao.cai120
0
gravatar for EagleEye
18 months ago by
EagleEye4.9k
Sweden
EagleEye4.9k wrote:

Encode project:

https://www.encodeproject.org/search/?type=experiment&replicates.library.biosample.uuid=d8ca0867-13cd-40df-9de0-29f9da53d935&status!=deleted&status!=revoked&status!=replaced&limit=all

Click on each individual links, you will find corresponding compressed BED files under 'Processed data' section.

OR

UCSC track search:

https://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=503465321_YmGKKqUV5mBilvCDa2ckhAWFSS9e&hgt_=1468864018&db=hg19&tsCurTab=advancedTab&hgt_tsDelRow=&hgt_tsAddRow=&hgt_tsPage=&tsSimple=&tsName=&tsDescr=&tsGroup=Any&tsType=bed&hgt_mdbVar1=cell&hgt_mdbVal1=GM12878&hgt_mdbVal1=GM12878-XiMat&hgt_mdbVar2=antibody&hgt_mdbVal2=Any&hgt_tSearch=search

ADD COMMENTlink modified 18 months ago • written 18 months ago by EagleEye4.9k

I know it is possible that I download several of files using the way you mentioned, but it will be inconvenient if I want to download a number of data

ADD REPLYlink written 18 months ago by izzy.yichao.cai120

1) Check the Table icon and download option from the search result where they have instructions for batch/bulk download.

https://www.encodeproject.org/search/?type=experiment&replicates.library.biosample.uuid=d8ca0867-13cd-40df-9de0-29f9da53d935&status!=deleted&status!=revoked&status!=replaced&limit=all

enter image description here

2) I hope the above option works. If not this will be your next option,

https://www.encodeproject.org/files/ENCFF002COO/@@download/ENCFF002COO.bed.gz

https://www.encodeproject.org/files/ENCFF002CPY/@@download/ENCFF002CPY.bed.gz

Since each bed file (for different antibody) for this cell type placed under different project id, it will be difficult to reterive even using globbing function with unix. The only option is you can parse the highlighted ids from array to wget using simple bash. Also for that you have to collect all 52 pairs (or less if you choose filter for only chipseq) of ids from the search result manually.

3) Check the ftp site and use globbing using wildcard with bash script if your mentioned cell type is available,

ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/

ADD REPLYlink modified 18 months ago • written 18 months ago by EagleEye4.9k

Thanks a lot!!! That really help me out. : )

ADD REPLYlink written 18 months ago by izzy.yichao.cai120

Please upvote if the solution worked.

ADD REPLYlink written 18 months ago by EagleEye4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1402 users visited in the last hour