Question: Split reference genome into callable regions by splitting on NNNN stretches
gravatar for William
3 months ago by
William4.7k wrote:

I am looking for a tool or piece of code that can take in a reference genome fasta file and output a bed file with X (equally sized) callable regions. The output can also be anything similar to the content of such a bed file.

These callable regions should be bordered upstream and downstream by any of these:

  • The start of the chromosome
  • The end of the chromosome
  • a stretch of 250bp unknown (N) nucleotides (or e.g. 1000bp N)

This to get regions that can be variant called in parallel, without the risk of sequencing reads and variants going over the borders of the callable regions.

bed fasta • 190 views
ADD COMMENTlink modified 3 months ago by Pierre Lindenbaum134k • written 3 months ago by William4.7k
gravatar for Pierre Lindenbaum
3 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:
  • use picard ScatterIntervalsByNs to extract the non-poly N regions.
  • convert this interval list to bed with awk.
  • run bedtools makewindows to generate your regions.
ADD COMMENTlink written 3 months ago by Pierre Lindenbaum134k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2483 users visited in the last hour