My current research focuses on constructing gene regulatory networks using ATAC-seq data. I have been using data obtained from an online repository, which is provided in a wig file format. To work with the data effectively, I convert it into a bed file using the "wig2bed" command line tool. The resulting bed file contains six columns, with the fifth column representing a score that indicates the intensity of reads for each specific peak.
While processing the data, I noticed that a majority of the scores are 0, which I have already eliminated. However, I still have a considerable number of peaks remaining, with scores ranging from, for example, 0.1 to 2700. I am now faced with the task of determining a cutoff value to retain the peaks above that threshold and subsequently construct a gene regulatory network based on them. I have come across some articles suggesting that the top 5 to 10 percent of ranked peaks could be informative. However, I am looking to include more peaks in my analysis rather than just the top percentage to have more comprehensive info. Here is my question, Does anybody have any ideas or criteria regarding the selection of a suitable cutoff for the intensity scores in bulk ATAC-seq data in a BED file, which will be used for constructing a gene regulatory network (GRN)? Thanks in advance for any help you can provide.