Generating Mappability Files
3
3
Entering edit mode
10.2 years ago
Ying W ★ 4.1k

I have some chip-seq data and I want to do some peak calling. To do peak calling with ZINBA, peakseq, MOSAiCS I need to have mappability data.

This data is given for hg19 ( http://www.bios.unc.edu/~nrashid/map50_hg19.tgz from this page http://code.google.com/p/zinba/wiki/UsingZINBA) however, the chip-seq data I use thus far have been aligned to 1000 genome's version of hg19 ( ftp://ftp.sanger.ac.uk/pub/1000genomes/tk2/main_project_reference/) and I do not wish to realign all the data I have (previously I have been using MACS, I want to see results from other peak callers)

I have tried to run ZINBA using hg19 mappability files but I get a segfault pretty early on. Thus, I am trying to generate these mappability files: I get the feeling that these files are unique to whatever you used as reference (so hg19 with chr1-22+x,y mappability files would not be interchangeable with chr1-22+x,y,M)

I was wondering if someone could shed some insight into what these mappability files are and if they are interchangeable (because if they are then I would only have to generate M+supercontigs for 1000 genome's version of hg19 and use chr1-22 from the available hg19).

Additionally, I was wondering if this was the right program to generate these files: http://archive.gersteinlab.org/proj/PeakSeq/Mappability_Map/Code/

Thanks.

chip-seq • 5.8k views
3
Entering edit mode

I would contact the ZINBA developers, the project seems recent so it you will be likely able to reach those that can help you.

3
Entering edit mode

The ZINBA webpage indicates that "these files were generated using code from Peakseq , developed by the Gerstein Lab", so the answer to your last question is yes, you should be able to use this code and generate your own mappability files. Alternatively, as suggested by Istvan, ask directly to the developers...

0
Entering edit mode

@nico - i could not find any links from the peakseq website that directly links to the program code I pasted above so I was unsure. thoes files are stored in a different directory than the main peakseq files

0
Entering edit mode

Can anyone please answer this " if someone could shed some insight into what these mappability files are and if they are interchangeable" and what are the advantages of using these files.

0
Entering edit mode

Ah! This question is little bit old, should I post a new one?

0
Entering edit mode

to get started, its probably best to read the peakseq paper and website: https://sites.google.com/a/brown.edu/bioinformatics-in-biomed/peakseq-for-chip-seq

3
Entering edit mode
10.2 years ago
Aaron H ▴ 170

I've generated mappability files for ZINBA with the code you linked to. You just need to follow the directions in the README. Split the human reference fasta into separate chromosomes with BioPython:

from Bio import SeqIO
seqs = SeqIO.parse(open(fasta_file), format='fasta')
for seq in seqs:
name = seq.id+'.fasta'
SeqIO.write(seq, open(name, 'w'), format='fasta')


Then make the PeakSeq code and run it for your needed K:

$python compile.py 50  ADD COMMENT 0 Entering edit mode where is the README? ADD REPLY 0 Entering edit mode Do you mean the README in the Map code? I think it should be name = seq.id+'.fa' ADD REPLY 0 Entering edit mode yes and yes. It takes quite a while to generate the mappability files. ADD REPLY 0 Entering edit mode Is ZINBA still maintained? It looks like the latest update was done two years ago... ADD REPLY 2 Entering edit mode 4.9 years ago This is how I have done it: #download code and make mkdir Mappability_Map curl http://archive.gersteinlab.org/proj/PeakSeq/Mappability_Map/Code/Mappability_Map.tar.gz tar -zxvf Mappability_Map.tar.gz -C Mappability_Map make #download reference genome curl http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/analysisSet/hg38.analysisSet.chroms.tar.gz mkdir hg38.analysisSet.chroms tar -zxvf hg38.analysisSet.chroms.tar.gz -C hg38.analysisSet.chroms #Move chromosomes 1-22,X,M to the same directory of the python scripts cd hg38.analysisSet.chroms chr=($(seq 1 1 22) X Y M)
for i in ${chr[*]}; do mv "chr"$i".fa" ../Mappability_Map ; done

#chmod everything to make sure it runs
cd ../Mappability_Map
chmod 777 compile.py
chmod 777 *.c
export PATH=/<mydirectory>/Mappability_Map/:\$PATH

#generete mappability files for 36mer length (replace by your desired number)
python compile.py 36

0
Entering edit mode
5.6 years ago
boczniak767 ▴ 830

What is "K", after the python compile.py?

0
Entering edit mode

This is a 4+ year old thread. It is possible that you are not going to get an answer for this question.

As for ZINBA, it is probably no longer maintained (students graduate and move on etc). You may want to switch to MACS2, which is what most people probably use for peak calling.

0
Entering edit mode

Thanks, as of question, "K" is probably the length of the sequencing read.