Question: how to find the overlapping genomic sites for two big mpileup files and make a window file around these overlapping sites
gravatar for jing.mengrabbit
10 months ago by
jing.mengrabbit20 wrote:

I have two mpileup files generated by samtools, their sizes are larger than 10GB. I would like to choose the genomic sites that are present in both mpileup files, and generate a matrix file. The matrix file has 201 columns, and the columns contain mpileup information for the genomic sites of overlapping genomic site +/-100 bp. The problem is that these two files are too big and it is impossible to read them all into memory. If every time reads N lines, there are two possible questions: 1, some genomic sites in the first mpileup file are present in the >N lines in the second mpileup. 2, It is impossible to make a matrix file for the genomic sites in the (N-100, N) lines. Also, the total matrix file will be very big. Could you please give me some hints to resolve the above problems? Thanks for your time!

alignment sequence genome • 237 views
ADD COMMENTlink written 10 months ago by jing.mengrabbit20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1508 users visited in the last hour