Converting Illumina Data From 'Gerald' To Sam
2
2
Entering edit mode
10.4 years ago

Hi all, I've been given an external hard drive containing a set of files mapped with some Illumina Tools (GERALD ?).

tail -3 s_3_2_export.txt
HWUSI-EAS454    14    3    120    16534    21510    0    2    TGTNNNNNNTNNTAACNNTNNNGGNNCNNTNNNCNNNNNCNNNNNNNCNNNANNGCNTNNTNCANNNCNCNNNTNC    BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB    QC                                            N
HWUSI-EAS454    14    3    120    17941    21505    0    2    ACTNNNNNNCNNGCTANNANNNNCNNNNNTNNGANNNNNTNNNNNNNCNNNNNNATNCNNTNNTNNNCNCNNNCNA    BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB    QC                                            N
HWUSI-EAS454    14    3    120    18005    21512    0    2    CTANNNNNNGNNTTCTNNTNNNNANNNNNCNNTANNNNNCNNNNNNNGNNNNNNCANCNNGNNCNNNCNANNNCNC    BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB    QC


where can I find a description of those columns ?

how can I convert it to SAM/BAM ?

thanks.

illumina sam conversion • 2.9k views
1
Entering edit mode

Just to add, some people call this format the "export" format, as the filenames tend with "_export".

8
Entering edit mode
10.4 years ago

In the misc/ directory of the samtools distribution, there is a script called "export2sam.pl" which should do the trick. See this thread.

5
Entering edit mode
10.4 years ago
Bio_X2Y ★ 3.9k

aaron's post covers a tool for the conversion. To answer the other part of your question, here are the descriptions from the GA Pipeline 1.4 documentation provided to us by Illumina (I don't think it's a public document). The format might be different for other versions of the Pipeline.

• Machine (as parsed from run folder name)
• Run Number (as parsed from run folder name)
• Lane
• Tile
• X Coordinate of cluster
• Y Coordinate of cluster
• Index String (blank for a non-indexed run)
• Read Number ("1" or "2" for paired end, blank for a single end)
• Quality String - in symbolic ASCII format (ASCII character code = quality value + 64)
• Match Chromosome - name of chromosome match was to OR code indicating why no match was done
• Match Contig (blank if no match found) - gives contig name if there is a match and the match chromosome is split into contigs
• Match Position (always with respect to forward strand, numbering starts at 1)
• Match Strand ("F" for forward or "R" for reverse, blank if no match)
• Match Descriptor - concise description of alignment. A numeral denotes a run of matching bases, a letter denotes substituation of a nucleotide, so e.g. for a 35 base read, "35" denotes an exact match and "32C2" denotes substitution of a "C" at the 33rd position
• Single Read Alignment Score - alignment score of single read match (if a paired read, gives alignment score of read if it were to be treated as a single read)
• Paired Read Alignment Score - alignment score of read pair (alignment score of a paired read and its partner, taken as a pair. Blank for a single read run)
• Partner Chromosome - not blank only if read is paired and its partner aligns to another chromosome, in which case it gives the name of the chromosome
• Partner Contig - not blank only if read is paired and its partner aligns to another chromosome and that partner is split into contigs
• Partner Offset - if a paired read's partner hits to the same chromosome (as it will in the vast majority of cases) and contig (if the chromosome is split into contigs) then this number added to Match Position gives the alignment position of the read's partner
• Partner Strand - which strand did the paired read's partner hit to("F" for forward or "R" for reverse, blank if no match)
• Filtering. Did the read pass quality filtering? "Y" for yes, "N" for no