Split a yaml file in to multiple files by keys
1
0
Entering edit mode
15 months ago
kani ▴ 10

Hi, I would like to split a yaml file in to multiple files based on a key (using 'yq' or other method), the file names should be written as key name. The map arrays are not in order! Appreciate any help! Thanks

Here's my input file test.yaml

samples:
  WHH525:
  - 3c657174
  - 5e3d9b37
  WHH527:
  - 33b5d3ed
  - 81f663d4
  WHH528:
  - 93f3bda0
  - befe7e0e
readunits:
  3c657174:
    flowcell_id: null
    fq1: WHH525/WHH525_HS006-PE-R00098_L002_R1.fastq.gz
    fq2: WHH525/WHH525_HS006-PE-R00098_L002_R2.fastq.gz
    lane_id: '2'
    library_id: WHH525
    rg_id: HS006-PE-R00098.2
    run_id: HS006-PE-R00098
  81f663d4:
    flowcell_id: null
    fq1: WHH527/WHH527_HS007-PE-R00094_L002_R1.fastq.gz
    fq2: WHH527/WHH527_HS007-PE-R00094_L002_R2.fastq.gz
    lane_id: '2'
    library_id: WHH527
    rg_id: HS007-PE-R00094.2
    run_id: HS007-PE-R00094
  93f3bda0:
    flowcell_id: null
    fq1: WHH528/WHH528_HS006-PE-R00100_L008_R1.fastq.gz
    fq2: WHH528/WHH528_HS006-PE-R00100_L008_R2.fastq.gz
    lane_id: '8'
    library_id: WHH528
    rg_id: HS006-PE-R00100.8
    run_id: HS006-PE-R00100
  5e3d9b37:
    flowcell_id: null
    fq1: WHH525/WHH525_HS006-PE-R00021_L001_R1.fastq.gz
    fq2: WHH525/WHH525_HS006-PE-R00021_L001_R2.fastq.gz
    lane_id: '1'
    library_id: WHH525
    rg_id: HS006-PE-R00021.1
    run_id: HS006-PE-R00021
  33b5d3ed:
    flowcell_id: null
    fq1: WHH527/WHH527_HS006-PE-R00097_L004_R1.fastq.gz
    fq2: WHH527/WHH527_HS006-PE-R00097_L004_R2.fastq.gz
    lane_id: '4'
    library_id: WHH527
    rg_id: HS006-PE-R00097.4
    run_id: HS006-PE-R00097
  befe7e0e:
    flowcell_id: null
    fq1: WHH528/WHH528_HS006-PE-R00098_L002_R1.fastq.gz
    fq2: WHH528/WHH528_HS006-PE-R00098_L002_R2.fastq.gz
    lane_id: '2'
    library_id: WHH528
    rg_id: HS006-PE-R00098.2
    run_id: HS006-PE-R00098

My expected output file names WHH525.yaml, WHH527.yaml and WHH528.yaml and the outputs are...

WHH525.yaml

samples:
  WHH525:
  - 3c657174
  - 5e3d9b37
readunits:
  3c657174:
    flowcell_id: null
    fq1: WHH525/WHH525_HS006-PE-R00098_L002_R1.fastq.gz
    fq2: WHH525/WHH525_HS006-PE-R00098_L002_R2.fastq.gz
    lane_id: '2'
    library_id: WHH525
    rg_id: HS006-PE-R00098.2
    run_id: HS006-PE-R00098
  5e3d9b37:
    flowcell_id: null
    fq1: WHH525/WHH525_HS006-PE-R00021_L001_R1.fastq.gz
    fq2: WHH525/WHH525_HS006-PE-R00021_L001_R2.fastq.gz
    lane_id: '1'
    library_id: WHH525
    rg_id: HS006-PE-R00021.1
    run_id: HS006-PE-R00021

WHH527.yaml

sample:
  WHH527:
  - 33b5d3ed
  - 81f663d4
readunits:
  33b5d3ed:
    flowcell_id: null
    fq1: WHH527/WHH527_HS006-PE-R00097_L004_R1.fastq.gz
    fq2: WHH527/WHH527_HS006-PE-R00097_L004_R2.fastq.gz
    lane_id: '4'
    library_id: WHH527
    rg_id: HS006-PE-R00097.4
    run_id: HS006-PE-R00097
  81f663d4:
    flowcell_id: null
    fq1: WHH527/WHH527_HS007-PE-R00094_L002_R1.fastq.gz
    fq2: WHH527/WHH527_HS007-PE-R00094_L002_R2.fastq.gz
    lane_id: '2'
    library_id: WHH527
    rg_id: HS007-PE-R00094.2
    run_id: HS007-PE-R00094

WHH528.yaml

samples:
  WHH528:
  - 93f3bda0
  - befe7e0e
  93f3bda0:
    flowcell_id: null
    fq1: WHH528/WHH528_HS006-PE-R00100_L008_R1.fastq.gz
    fq2: WHH528/WHH528_HS006-PE-R00100_L008_R2.fastq.gz
    lane_id: '8'
    library_id: WHH528
    rg_id: HS006-PE-R00100.8
    run_id: HS006-PE-R00100
  befe7e0e:
    flowcell_id: null
    fq1: WHH528/WHH528_HS006-PE-R00098_L002_R1.fastq.gz
    fq2: WHH528/WHH528_HS006-PE-R00098_L002_R2.fastq.gz
    lane_id: '2'
    library_id: WHH528
    rg_id: HS006-PE-R00098.2
    run_id: HS006-PE-R00098
yq yml yaml • 1.6k views
ADD COMMENT
1
Entering edit mode

Appreciate any help!

use a Yaml api. https://pyyaml.org/wiki/PyYAMLDocumentation

ADD REPLY
0
Entering edit mode

Thanks Pierre Lindenbaum

ADD REPLY
1
Entering edit mode
15 months ago
kani ▴ 10

This works well for me.

https://github.com/mikefarah/yq/discussions/1562

yq  '. as $d | 
  (.samples | keys | .[]) as $i ireduce([]; . + 
   [$d | {"samples": .samples | pick([$i]), "readunits": .readunits | pick($d | .samples[$i])} ]) 
   | .[]' -s '(.samples | keys | .[0]) + ".yaml"' file.yaml
ADD COMMENT

Login before adding your answer.

Traffic: 1409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6