Question: Download all chip-seq data of a cell type in ENCODE
1
gravatar for izzy.yichao.cai
2.4 years ago by
izzy.yichao.cai140 wrote:

Hi all!

Does anyone know how to download all chip-seq data(in BED format) of a certain cell type(let's say GM12878) in ENCODE.

I tried search via matrix but can't select specific cell type for download. It can only select a series of cell type(Like I can select all immortalized cell lines, which include GM12878). I only want to download the data related to GM12878.

Any hints?

Thanks a lot for your time and help! : )

chip-seq • 2.3k views
ADD COMMENTlink modified 2.4 years ago by EagleEye6.0k • written 2.4 years ago by izzy.yichao.cai140
0
gravatar for EagleEye
2.4 years ago by
EagleEye6.0k
Sweden
EagleEye6.0k wrote:

Encode project:

https://www.encodeproject.org/search/?type=experiment&replicates.library.biosample.uuid=d8ca0867-13cd-40df-9de0-29f9da53d935&status!=deleted&status!=revoked&status!=replaced&limit=all

Click on each individual links, you will find corresponding compressed BED files under 'Processed data' section.

OR

UCSC track search:

https://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=503465321_YmGKKqUV5mBilvCDa2ckhAWFSS9e&hgt_=1468864018&db=hg19&tsCurTab=advancedTab&hgt_tsDelRow=&hgt_tsAddRow=&hgt_tsPage=&tsSimple=&tsName=&tsDescr=&tsGroup=Any&tsType=bed&hgt_mdbVar1=cell&hgt_mdbVal1=GM12878&hgt_mdbVal1=GM12878-XiMat&hgt_mdbVar2=antibody&hgt_mdbVal2=Any&hgt_tSearch=search

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by EagleEye6.0k

I know it is possible that I download several of files using the way you mentioned, but it will be inconvenient if I want to download a number of data

ADD REPLYlink written 2.4 years ago by izzy.yichao.cai140

1) Check the Table icon and download option from the search result where they have instructions for batch/bulk download.

https://www.encodeproject.org/search/?type=experiment&replicates.library.biosample.uuid=d8ca0867-13cd-40df-9de0-29f9da53d935&status!=deleted&status!=revoked&status!=replaced&limit=all

enter image description here

2) I hope the above option works. If not this will be your next option,

https://www.encodeproject.org/files/ENCFF002COO/@@download/ENCFF002COO.bed.gz

https://www.encodeproject.org/files/ENCFF002CPY/@@download/ENCFF002CPY.bed.gz

Since each bed file (for different antibody) for this cell type placed under different project id, it will be difficult to reterive even using globbing function with unix. The only option is you can parse the highlighted ids from array to wget using simple bash. Also for that you have to collect all 52 pairs (or less if you choose filter for only chipseq) of ids from the search result manually.

3) Check the ftp site and use globbing using wildcard with bash script if your mentioned cell type is available,

ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by EagleEye6.0k

Thanks a lot!!! That really help me out. : )

ADD REPLYlink written 2.4 years ago by izzy.yichao.cai140

Please upvote if the solution worked.

ADD REPLYlink written 2.4 years ago by EagleEye6.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1001 users visited in the last hour