Split reference genome into callable regions by splitting on NNNN stretches
1
0
Entering edit mode
3.4 years ago
William ★ 5.3k

I am looking for a tool or piece of code that can take in a reference genome fasta file and output a bed file with X (equally sized) callable regions. The output can also be anything similar to the content of such a bed file.

These callable regions should be bordered upstream and downstream by any of these:

  • The start of the chromosome
  • The end of the chromosome
  • a stretch of 250bp unknown (N) nucleotides (or e.g. 1000bp N)

This to get regions that can be variant called in parallel, without the risk of sequencing reads and variants going over the borders of the callable regions.

fasta bed • 914 views
ADD COMMENT
3
Entering edit mode
3.4 years ago
  • use picard ScatterIntervalsByNs to extract the non-poly N regions.
  • convert this interval list to bed with awk.
  • run bedtools makewindows to generate your regions.
ADD COMMENT

Login before adding your answer.

Traffic: 1961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6