I have a question which has been bothering me. I have been given a .txt file containing : chromosome number, start, end, avg coverage, min coverage, max coverage, %bases below threshold, bases below threshold.
I have 5 files that will be used as our reference (population) and 4 unhealthy files (all same file format).
This is a file generated by SureCall software by Agilent. (PAIR END sequencing, custom panel NGS)
My task is to provide a list for each gene with the locations that have undergone copy number losses/gains. I am not sure how I can best normalise my data.. I have been told to take the mean of all the loci's avg coverage and compare that to each individual avgcoverage number although I feel that this is not appropriate as it does not take into account GC content bias and loci length..
I am able to utilise R. There are multiple packages available for it but I DO NOT have the raw fastq or BAM files..
Thank you for your help!!