Is there amy post/manual/blog to summarize the attribute tag (optional field) of tophat output bam file? Hom many attribute tag does tophat output bam file have? What is the meanning of each tag?
I run the tophat for HiSeq2000 paired-end 2*100bp strand-specific, with the option "--library-type fr-firststrand". The output 13 tags is as below, is there any other tags? Is it right for my understanding ?
AS:i: alignment score generated by aligner
CC:Z: reference name of the next hit; "=" for the same chromosome
CP:i: leftmost coordinate of the next hit
HI:i: query hit index, indicating the alignment record is the i-th one stored in SAM
MD:Z: string for mismatching positions.
NH:i: number of reported alignments that contains the query in the current record.
NM:i: edit distance to the reference, including ambiguous bases but excluding clipping
XG:i: the number of gap extensions, for both read and reference gaps, in the alignment.
XM:i: the number of mismatches in the alignment
XN:i: the number of ambiguous bases in the reference covering this alignment
XO:i: the number of gap opens, for both read and reference gaps, in the alignment.
XS:Z: if either fr-firststrand or fr-secondstrand is specified, every read alignment will have an XS attribute tag as explained below.
YT:Z: value of
UU indicates the read was not part of a pair. Value of
CP indicates the read was part of a pair and the pair aligned concordantly. Value of
DP indicates the read was part of a pair and the pair aligned discordantly. Value of
UP indicates the read was part of a pair but the pair failed to aligned either concordantly or discordantly. Filtering: #filtering
The resource I have read:
# the above description of "XS"
# the above description of "AS", "CC", "CP", "HI", "MD", "NH", "NM"
#the above description of "XG", "XM", "XN", "XO", "YT"
# I don't know if the tag has the same meaning in bowtie and tophat output bam.