I did a test with only one bam file and chrX on step A) Extract read mapping from BAM/SAM Files When I type -genome mm9 it outputs a file 17Mb root file When I don't specify any genome name it outputs a 30Mb root file Which one is the good one ??
We have 4 bam files per sample so for every step A,B,C,D,E Can I type in the command line the path for the 4 bam files in one shot ? Like that : [whatever command step] /mybamfile/file1.bam /mybamfile/file2.bam /mybamfile/file3.bam /mybamfile/file4.bam
GENERATING HISTOGRAM In the README file it says "Files with chromosome sequences are required and should reside in running directory or directory specified by -d option. Files should be named as: chr1.fa, chr2.fa, etc."
3.1. Are those files from the reference genome mm9 ? (Because that's all we have..)
3.2 file.root is that my previous out.root done in Step A?
3.3 If I do generate histogram, what does that tell me ? Am I suppose to get any number to use later out of it?
3.4. After -his we have to write the bin_size, I have no clue where am I suppose to find that number and what does it represent ?
4.Step C) CALCULATING STATISTICS file.root is file the same name in step A) named out.root ? so we re-use the same out.file all the time ?. I tried randomly with out.root and it says 20 times
Zero value of GC average. Bin 1083251 with center 1.08325e+08 is not corrected. (says that about 20 times) Then it says that : Making statistics for chrX after GC correction ... Warning in <Fit>: Fit data is empty Warning in <Fit>: Fit data is empty Average RD per bin (1-22) is 0 +- 0 (after GC correction) Average RD per bin (X,Y) is 3.42284 +- 3.19117 (after GC correction)
What's are all those numbers ??