I have two mpileup files generated by samtools, their sizes are larger than 10GB. I would like to choose the genomic sites that are present in both mpileup files, and generate a matrix file. The matrix file has 201 columns, and the columns contain mpileup information for the genomic sites of overlapping genomic site +/-100 bp. The problem is that these two files are too big and it is impossible to read them all into memory. If every time reads N lines, there are two possible questions: 1, some genomic sites in the first mpileup file are present in the >N lines in the second mpileup. 2, It is impossible to make a matrix file for the genomic sites in the (N-100, N) lines. Also, the total matrix file will be very big. Could you please give me some hints to resolve the above problems? Thanks for your time!
Question: how to find the overlapping genomic sites for two big mpileup files and make a window file around these overlapping sites
10 months ago by
jing.mengrabbit • 20
jing.mengrabbit • 20 wrote:
ADD COMMENT • link •
Please log in to add an answer.
Powered by Biostar version 2.3.0
Traffic: 1508 users visited in the last hour