Merging CRAM files
2
0
Entering edit mode
10 months ago
Matteo Ungaro ▴ 100

Hi there I'm facing the task of merging the CRAM files for 25 human samples.

Each on is divided into 12-13 CRAM files (total of 322 individual CRAMs), for which I have set a sample identifier and number as follow code_number where the code refers to the samples identifier and the number to the CRAM partitioning.

Now, I'm aware samtools can do so; however, I have limited experience with CRAM files let alone having to merge a large number of them. So, my question is what is the exact command I should use e.g

samtools merge --input-fmt-option CRAM -o <merged_output> -@ 64 <file1> <file2> <file3> etc.

Do I need the .crai index for such operation, and what is the best format to output to — say BAM over a merged CRAM? Still, I will then need to use the reference to get back to FASTQ.

Thanks in advance!

samtools CRAM • 1.2k views
ADD COMMENT
4
Entering edit mode
10 months ago

Do I need the .crai index for such operation

no

what is the best format to output to — say BAM over a merged CRAM

use CRAM (smaller) if this is the very last step of your workflow, use BAM (faster) if you have another step (recalibration, duplicates...)

ADD COMMENT
0
Entering edit mode

Hi @Pierre Lindenbaum,

Thanks for the feedback. However, for some reason running the --input-fmt-option CRAM causes the following error

[E::hts_opt_add] Unknown option 'CRAM'
Usage: samtools merge [-nurlf] [-h inh.sam] [-b <bamlist.fofn>] <out.bam> <in1.bam> [<in2.bam> ... <inN.bam>]

Options:
  -n         Input files are sorted by read name
  -t TAG     Input files are sorted by TAG value
  -r         Attach RG tag (inferred from file names)
  -u         Uncompressed BAM output
  -f         Overwrite the output BAM if exist
  -1         Compress level 1
  -l INT     Compression level, from 0 to 9 [-1]
  -R STR     Merge file in the specified region STR [all]
  -h FILE    Copy the header in FILE to <out.bam> [in1.bam]
  -c         Combine @RG headers with colliding IDs [alter IDs to be distinct]
  -p         Combine @PG headers with colliding IDs [alter IDs to be distinct]
  -s VALUE   Override random seed
  -b FILE    List of input BAM filenames, one per line [null]
  -X         Use customized index files
  -L FILE    Specify a BED file for multiple region filtering [null]
  --no-PG    do not add a PG line
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
  -@, --threads INT
               Number of additional threads to use [0]
      --write-index
               Automatically index the output files [off]
      --verbosity INT
               Set level of verbosity

Any idea why?

ADD REPLY
0
Entering edit mode

the format CRAM/BAM/SAM is automatically detected you don't need --input-fmt-option which anyways, doesn't work like --input-fmt-option CRAM . It's a key=value syntax http://www.htslib.org/doc/samtools.html

ADD REPLY
1
Entering edit mode
10 months ago
jkbonfield ★ 1.2k

--input-fmt-option is for options, not format. It was added as a way of specifying the reference sequence for commands reading CRAMs that didn't have a way to specify reference, eg with --input-fmt-option reference=ref.fa.

You don't need to specify the input file type as htslib will auto-detect it.

It'll also detect the output file type based on filename, but if outputting to stdout or a non-standard name, you can use --output-format cram, -O cram for short. You can also add format options here. Eg -O cram,embed_ref,use_bzip2.

ADD COMMENT

Login before adding your answer.

Traffic: 2835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6