Dear All,
I have a question which has been bothering me. I have been given a .txt file containing : chromosome number, start, end, avg coverage, min coverage, max coverage, %bases below threshold, bases below threshold.
I have 5 files that will be used as our reference (population) and 4 unhealthy files (all same file format).
This is a file generated by SureCall software by Agilent. (PAIR END sequencing, custom panel NGS)
My task is to provide a list for each gene with the locations that have undergone copy number losses/gains. I am not sure how I can best normalise my data.. I have been told to take the mean of all the loci's avg coverage and compare that to each individual avgcoverage number although I feel that this is not appropriate as it does not take into account GC content bias and loci length..
I am able to utilise R. There are multiple packages available for it but I DO NOT have the raw fastq or BAM files..
Thank you for your help!!
Most of the copy-number alteration calling software accept the coverage files as input (just formatted in a nice way). I won't recommend my tool since it is designed for matched data, but this one http://bioconductor.org/packages/release/bioc/vignettes/PureCN/inst/doc/PureCN.pdf is exactly what you're looking for. Format your data in a suitable format and run the analysis!