I need to read point mutations from the 1000Genome data set (total of 84,801,880 rows), and check for overlap with another set of chromosome ranges using the GenomicRanges Package in R.
To do so I usually run the following code:
> x0 = read.csv ("NameOfFile1.csv") #read data from the first file (1000Genome in this case) > x1 = read.csv ("NameOfFile2.csv") #read data from the second file > library(GenomicRanges) > gr0 = with(x0, GRanges(chr, IRanges(start=start, end=stop))) > gr1 = with(x1, GRanges(chr, IRanges(start=start, end=stop))) > hits = findOverlaps(gr0, gr1) > hits
However, the first file is too big, and it's not possible to convert it to CSV. I tried converting it to txt file, but still I cannot read it in R. I used the following command:
> x0 = read.table ("NameofFile1.txt",header=FALSE,sep=",",stringsAsFactors = FALSE,quote = "") Error: cannot allocate vector of size 1000.0 Mb
Is there any other way to check for overlaps between the two files? Your help is highly appreciated!