Official announcement:
Samtools Release 1.5 [Solstice Release] (21st June 2017)
Samtools fastq now has a -i option to create a fastq file from an index tag, and a -T option (similar to -t) to add user specified aux tags to the fastq header line.
Samtools fastq can now create compressed fastq files, by giving the output filenames an extention of .gq, .bgz, or .bgzf
Samtools sort has a -t TAG option, that allows records to be sorted by the value of the specified aux tag, then by position or name. Merge gets a similar option, allowing files sorted this way to be merged.
Let's go over each item and see how it works in practice.
Samtools fastq now has a -i option to create a fastq file from an index tag, and a -T option (similar to -t) to add user specified aux tags to the fastq header line.
Help flags for samtools fastq
:
...
-i add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
-T TAGLIST copy arbitrary tags to the FASTQ header line
...
Let's give it a go. Get the test file.
curl https://raw.githubusercontent.com/samtools/samtools/develop/test/dat/bam2fq.005.sam > test.sam
now run:
samtools fastq test.sam | head -4
it prints
@ref1_grp1_p001/1
CGAGCTCGGT
+
!!!!!!!!!!
whereas:
samtools fastq -T MD,BC,za test.sam | head -4
prints:
@ref1_grp1_p001/1 MD:Z:10 BC:Z:AC-GT za:Z:Hello world!
CGAGCTCGGT
+
!!!!!!!!!!
The -i
flag is poorly documented and I managed to figure it out only by scouring the test examples on the GitHub repository. It requires setting the --index-format
parameter and a file specified via the --i1
parameter to collect the indices into.
samtools fastq -i --i1 indices.fq --index-format 'i2' -T MD,BC,za test.sam | head -4
will produce:
@ref1_grp1_p001/1 MD:Z:10 BC:Z:AC-GT za:Z:Hello world! 1:N:0:AC
CGAGCTCGGT
+
!!!!!!!!!!
and a file called indices.fq
that contains:
@ref1_grp1_p001/1 MD:Z:10 BC:Z:AC-GT za:Z:Hello world! 1:N:0:AC
AC
+
""
@ref1_grp1_p002/1 MD:Z:10 BC:Z:AATT+CCGG za:Z:Another string 1:N:0:AA
AA
+
""
@ref1_grp2_p001/1 MD:Z:8 BC:Z:TG+CA za:Z:!"$%^&*() 1:N:0:TG
TG
+
ab
Samtools fastq can now create compressed fastq files, by giving the output filenames an extention of .gq, .bgz, or .bgzf
Example:
samtools fastq -1 read1.fq.gz -2 read2.fq.gz align.bam
The release note is a bit confusing though.It is not clear what the .qg
extension above means.Perhaps a typo for .gz
since that works as well as demonstrated above. Also unclear is the difference between .bgz
and bgzf
.
Samtools sort has a -t TAG option, that allows records to be sorted by the value of the specified aux tag, then by position or name. Merge gets a similar option, allowing files sorted this way to be merged.
samtools sort align.bam | samtools view | cut -f 1,12-25 | head -5
prints:
SRR343051.887 NM:i:0 MD:Z:101 AS:i:101 XS:i:101 RG:Z:foo XA:Z:NC_020370.1,-55728,101M,0;
SRR343051.542 NM:i:0 MD:Z:101 AS:i:101 XS:i:101 RG:Z:foo XA:Z:NC_020370.1,-55615,101M,0;
SRR343051.9863 NM:i:0 MD:Z:101 AS:i:101 XS:i:101 RG:Z:foo XA:Z:NC_020370.1,-55587,101M,0;
SRR343051.887 NM:i:0 MD:Z:101 AS:i:101 XS:i:101 RG:Z:foo XA:Z:NC_020370.1,+55573,101M,0;
SRR343051.9863 NM:i:0 MD:Z:101 AS:i:101 XS:i:101 RG:Z:foo XA:Z:NC_020370.1,+55479,101M,0;
whereas:
samtools sort -t AS align.bam | samtools view | cut -f 1,12-25 | head -5
prints:
SRR343051.1909 AS:i:0 XS:i:0 RG:Z:foo
SRR343051.5040 AS:i:0 XS:i:0 RG:Z:foo
SRR343051.22 AS:i:0 XS:i:0 RG:Z:foo
SRR343051.2588 AS:i:0 XS:i:0 RG:Z:foo
SRR343051.3324 AS:i:0 XS:i:0 RG:Z:foo
Why is this post labeled "forum" and not "news"?
Or 'tutorial', maybe?
I was not sure what it ought to be. News would fit if there was just the initial statement. A tutorial label felt like giving it too much importance. Either way would work.
Posts labeled with "forum" have some component that warrants discussion/generates opposing opinions (in my mind). Since you are demonstrating some of the features with example data tutorial may fit better.
I'll make it a tutorial then.