How To Prepare Input Files For The Genetrack Peak Caller
1
0
Entering edit mode
10.2 years ago
kandoigaurav ▴ 150

I would like to use Genetrack for calling nucleosome positions and was wondering how can I prepare the input files for it starting from SRA sequence reads!

sra sam bam • 2.7k views
ADD COMMENT
1
Entering edit mode
10.2 years ago

The following steps are necessary:

  1. align reads to a reference and produce BAM alignment files
  2. transform the BAM file to BED format with say bedtools bamtobed or other methods
  3. sort the BED file by coordinate sort -k1,1 -k2,2g -o out.bed in.bed

you can then load the resulting BED file into the genetrack command line tool.

ADD COMMENT
0
Entering edit mode

Thank you Dr. Albert! This should be of immense help.

ADD REPLY
0
Entering edit mode

corrected the sort command as shown here: http://cassjohnston.wordpress.com/2011/05/10/unix-sort-bed-file/

ADD REPLY
0
Entering edit mode

I was hoping to utilize an approach similar to that described in the paper, 'A compiled and systematic reference map of nucleosome positions across the Saccharomyces cerevisiae genome' to construct a compiled consensus map. To this end, I've generated nucleosome maps for few Drosophila datasets using GeneTrack.

However, I'm unable to understand the methodology used to generate a reference map using these predicted maps. I see that GeneTrack is used for defining a new consensus position, but I fail to realize how should I format my genetrack input file for the same?

ADD REPLY
0
Entering edit mode

there are two different unrelated steps of the process.

  1. one is to define a positions based on the signal. This will produce a number of intervals over the genome. For this one uses a peak caller.
  2. out of the peaks that one obtained in step 1 they need to refine them: keep some based on some conditions, label them based on relative positions of some other features like 1st, 2nd, 3rd etc, account for the presence or absence of other potentially overlapping features, etc. this second step is a data analysis problem, has little to do with a peak caller. It is basically an interval intersect problem with many facets.

There are very few tools to automate the 2nd step, one needs to implement their own methodology. The reason for this is that calling a peak is a reasonable objective task, but filtering and naming these peaks by various conditions etc is a lot more subjective and it is difficult to write code that is both sufficiently robust while being flexible and correct.

Adding to the problems is that it is probably impossible to publish a tool that only does this latter step, although I would agree that is more important than step 1. Alas the way science works is sometimes counterintuitive.

ADD REPLY

Login before adding your answer.

Traffic: 2269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6