Split reference genome into callable regions by splitting on NNNN stretches
Entering edit mode
10 months ago
William ★ 4.9k

I am looking for a tool or piece of code that can take in a reference genome fasta file and output a bed file with X (equally sized) callable regions. The output can also be anything similar to the content of such a bed file.

These callable regions should be bordered upstream and downstream by any of these:

  • The start of the chromosome
  • The end of the chromosome
  • a stretch of 250bp unknown (N) nucleotides (or e.g. 1000bp N)

This to get regions that can be variant called in parallel, without the risk of sequencing reads and variants going over the borders of the callable regions.

fasta bed • 327 views
Entering edit mode
10 months ago
  • use picard ScatterIntervalsByNs to extract the non-poly N regions.
  • convert this interval list to bed with awk.
  • run bedtools makewindows to generate your regions.

Login before adding your answer.

Traffic: 2399 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6