I will like to identify selection sweep using maf, can anyone help me with r script that can help me to pick up 7 contiguous snp with maf of equal or less than 0.01. (basic a sliding window to identify string of snp with less than or equal to 0.01 maf. My data look like this
I am not sure what the finding of 7 or whatever number of contigous low maf SNPs would indicate genetically. Can you explain why this observation (why 7?) would be relevant?
Thanks for your response. The number 7 is determined by the number monomorphic loci in the genome For example, if 15% of SNPs are monomorphic within a breed the probability that N contiguous SNPs are monomorphic is 0.15N, assuming independence, and in testing 52,942 SNP on 29 autosomes we would expect to find 0.15N x (52,942 - 29 x (N - 1)) regions in which N contiguous SNPs had fixed alleles. For N = 5 this corresponds to 4.0 false positives per breed but only 0.6 false positives per breed when N = 6. (Ramey et al 2013), thus number 7 in my case was because about 20 percent of snp are monomorphic in my breed.
So the Number of 7 contiguous loci spanning at least 200 kb and with a minor allele frequency ≤ 0.01 will be required to declare a selective sweep region in each breed.
thanks
ADD REPLY
• link
updated 4.7 years ago by
Ram
44k
•
written 10.3 years ago by
somakina
▴
40
0
Entering edit mode
Thank you for the detailed explanation. In this case the run length code below should work, it just needs a last filtering step to check whether each run covers a large enough region (comparing: pos(run-end) - pos(run-start) >= 200 kb).
Thanks for your response, however I have loaded the Iranges packages but when I run the runLenght function I get an error message that say unable to find an inherited method for function 'runLenght' for signature rle. Would you please assist me in running the runLenght package
ADD REPLY
• link
updated 4.7 years ago by
Ram
44k
•
written 10.3 years ago by
somakina
▴
40
0
Entering edit mode
Sure, could you please provide an example of your data with chromosome information, is there simply a chrom column added to the table? Please use 'edit' function to insert the example.
## toy data with two runs of 7 mafs < 0.01:
df = data.framesnp.id=paste0("SNP",1:20),
bta="buff", pos=1:20*100,maf=c(rep(0.009,7),
rep(0.1,3),rep(0.0001,7), rep(0.02,3)))
df = df[order(df$pos),] # order data frame by position
maf0.01 = Rle(df$maf <= 0.01) # use run length encoding (Rle) to detect runs
rind = runLength(maf0.01) >= 7 & runValue(maf0.01) # select appropriate runs (len >= 7 and TRUE)
imat = cbind(start=start(maf0.01)[rind],end=end(maf0.01)[rind]) # index matrix for apply
apply(imat, 1, function(x) df[x[1]:x[2],]) # extract the selected runs from input data
I am not sure what the finding of 7 or whatever number of contigous low maf SNPs would indicate genetically. Can you explain why this observation (why 7?) would be relevant?
Thanks for your response. The number 7 is determined by the number monomorphic loci in the genome For example, if 15% of SNPs are monomorphic within a breed the probability that N contiguous SNPs are monomorphic is 0.15N, assuming independence, and in testing 52,942 SNP on 29 autosomes we would expect to find
0.15N x (52,942 - 29 x (N - 1))
regions in which N contiguous SNPs had fixed alleles. For N = 5 this corresponds to 4.0 false positives per breed but only 0.6 false positives per breed when N = 6. (Ramey et al 2013), thus number 7 in my case was because about 20 percent of snp are monomorphic in my breed.So the Number of 7 contiguous loci spanning at least 200 kb and with a minor allele frequency ≤ 0.01 will be required to declare a selective sweep region in each breed.
thanks
Thank you for the detailed explanation. In this case the run length code below should work, it just needs a last filtering step to check whether each run covers a large enough region (comparing: pos(run-end) - pos(run-start) >= 200 kb).
Hi
Thanks for your response, however I have loaded the Iranges packages but when I run the runLenght function I get an error message that say unable to find an inherited method for function 'runLenght' for signature rle. Would you please assist me in running the runLenght package
Thanks in advance
Hi, it is 'runLength' not runLenght. Could you copy paste the source code I posted? This code is tested with R 3.0.1
Sure, could you please provide an example of your data with chromosome information, is there simply a chrom column added to the table? Please use 'edit' function to insert the example.