Tutorial: What is new in samtools release 1.5 [Solstice Release] (21st June 2017)
9
gravatar for Istvan Albert
3.6 years ago by
Istvan Albert ♦♦ 86k
University Park, USA
Istvan Albert ♦♦ 86k wrote:

Official announcement:


Samtools Release 1.5 [Solstice Release] (21st June 2017)

  • Samtools fastq now has a -i option to create a fastq file from an index tag, and a -T option (similar to -t) to add user specified aux tags to the fastq header line.

  • Samtools fastq can now create compressed fastq files, by giving the output filenames an extention of .gq, .bgz, or .bgzf

  • Samtools sort has a -t TAG option, that allows records to be sorted by the value of the specified aux tag, then by position or name. Merge gets a similar option, allowing files sorted this way to be merged.


Let's go over each item and see how it works in practice.

Samtools fastq now has a -i option to create a fastq file from an index tag, and a -T option (similar to -t) to add user specified aux tags to the fastq header line.

Help flags for samtools fastq:

...
    -i          add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
    -T TAGLIST  copy arbitrary tags to the FASTQ header line
...

Let's give it a go. Get the test file.

curl https://raw.githubusercontent.com/samtools/samtools/develop/test/dat/bam2fq.005.sam > test.sam

now run:

samtools fastq test.sam | head -4

it prints

@ref1_grp1_p001/1
CGAGCTCGGT
+
!!!!!!!!!!

whereas:

samtools fastq -T MD,BC,za test.sam | head -4

prints:

@ref1_grp1_p001/1   MD:Z:10 BC:Z:AC-GT  za:Z:Hello world!
CGAGCTCGGT
+
!!!!!!!!!!

The -i flag is poorly documented and I managed to figure it out only by scouring the test examples on the GitHub repository. It requires setting the --index-format parameter and a file specified via the --i1 parameter to collect the indices into.

samtools fastq -i --i1 indices.fq --index-format 'i2' -T MD,BC,za test.sam | head -4

will produce:

@ref1_grp1_p001/1   MD:Z:10 BC:Z:AC-GT  za:Z:Hello world! 1:N:0:AC
CGAGCTCGGT
+
!!!!!!!!!!

and a file called indices.fq that contains:

@ref1_grp1_p001/1   MD:Z:10 BC:Z:AC-GT  za:Z:Hello world! 1:N:0:AC
AC
+
""
@ref1_grp1_p002/1   MD:Z:10 BC:Z:AATT+CCGG  za:Z:Another string 1:N:0:AA
AA
+
""
@ref1_grp2_p001/1   MD:Z:8  BC:Z:TG+CA  za:Z:!"$%^&*() 1:N:0:TG
TG
+
ab

Samtools fastq can now create compressed fastq files, by giving the output filenames an extention of .gq, .bgz, or .bgzf

Example:

samtools fastq -1 read1.fq.gz -2 read2.fq.gz align.bam

The release note is a bit confusing though.It is not clear what the .qg extension above means.Perhaps a typo for .gz since that works as well as demonstrated above. Also unclear is the difference between .bgz and bgzf.

Samtools sort has a -t TAG option, that allows records to be sorted by the value of the specified aux tag, then by position or name. Merge gets a similar option, allowing files sorted this way to be merged.

samtools sort align.bam | samtools view | cut -f 1,12-25 | head -5

prints:

SRR343051.887   NM:i:0  MD:Z:101    AS:i:101    XS:i:101    RG:Z:foo    XA:Z:NC_020370.1,-55728,101M,0;
SRR343051.542   NM:i:0  MD:Z:101    AS:i:101    XS:i:101    RG:Z:foo    XA:Z:NC_020370.1,-55615,101M,0;
SRR343051.9863  NM:i:0  MD:Z:101    AS:i:101    XS:i:101    RG:Z:foo    XA:Z:NC_020370.1,-55587,101M,0;
SRR343051.887   NM:i:0  MD:Z:101    AS:i:101    XS:i:101    RG:Z:foo    XA:Z:NC_020370.1,+55573,101M,0;
SRR343051.9863  NM:i:0  MD:Z:101    AS:i:101    XS:i:101    RG:Z:foo    XA:Z:NC_020370.1,+55479,101M,0;

whereas:

samtools sort -t AS align.bam | samtools view | cut -f 1,12-25 | head -5

prints:

SRR343051.1909  AS:i:0  XS:i:0  RG:Z:foo
SRR343051.5040  AS:i:0  XS:i:0  RG:Z:foo
SRR343051.22    AS:i:0  XS:i:0  RG:Z:foo
SRR343051.2588  AS:i:0  XS:i:0  RG:Z:foo
SRR343051.3324  AS:i:0  XS:i:0  RG:Z:foo
samtools tutorial • 1.3k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Istvan Albert ♦♦ 86k

Why is this post labeled "forum" and not "news"?

ADD REPLYlink written 3.6 years ago by GenoMax96k

Or 'tutorial', maybe?

ADD REPLYlink written 3.6 years ago by WouterDeCoster45k

I was not sure what it ought to be. News would fit if there was just the initial statement. A tutorial label felt like giving it too much importance. Either way would work.

ADD REPLYlink written 3.6 years ago by Istvan Albert ♦♦ 86k

Posts labeled with "forum" have some component that warrants discussion/generates opposing opinions (in my mind). Since you are demonstrating some of the features with example data tutorial may fit better.

ADD REPLYlink written 3.6 years ago by GenoMax96k

I'll make it a tutorial then.

ADD REPLYlink written 3.6 years ago by Istvan Albert ♦♦ 86k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1899 users visited in the last hour
_