Question: how to find the overlapping genomic sites for two big mpileup files and make a window file around these overlapping sites
gravatar for jing.mengrabbit
7 months ago by
jing.mengrabbit10 wrote:

I have two mpileup files generated by samtools, their sizes are larger than 10GB. I would like to choose the genomic sites that are present in both mpileup files, and generate a matrix file. The matrix file has 201 columns, and the columns contain mpileup information for the genomic sites of overlapping genomic site +/-100 bp. The problem is that these two files are too big and it is impossible to read them all into memory. If every time reads N lines, there are two possible questions: 1, some genomic sites in the first mpileup file are present in the >N lines in the second mpileup. 2, It is impossible to make a matrix file for the genomic sites in the (N-100, N) lines. Also, the total matrix file will be very big. Could you please give me some hints to resolve the above problems? Thanks for your time!

alignment sequence genome • 200 views
ADD COMMENTlink written 7 months ago by jing.mengrabbit10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 720 users visited in the last hour