Question: Could Any One Explain What Information Inside Sam File?
1
gravatar for M K
5.4 years ago by
M K460
United States
M K460 wrote:

I mapped my rna reads to human genome using Tophat, and I got the binary file accepted_hits.bam. After that I used samtools to convert it to .sam file, but when I looked to this .sam file I didn't got full understand of it. so can any one help me to understand what inside this file and what important information it. This is a small portion of this file:

@HD     VN:1.0  SO:coordinate
@SQ     SN:1    LN:249250621
@SQ     SN:10   LN:135534747
@SQ     SN:11   LN:135006516
@SQ     SN:12   LN:133851895
@SQ     SN:13   LN:115169878
@SQ     SN:14   LN:107349540
@SQ     SN:15   LN:102531392
@SQ     SN:16   LN:90354753
@SQ     SN:17   LN:81195210
@SQ     SN:18   LN:78077248
@SQ     SN:19   LN:59128983
@SQ     SN:2    LN:243199373
@SQ     SN:20   LN:63025520
@SQ     SN:21   LN:48129895
@SQ     SN:22   LN:51304566
@SQ     SN:3    LN:198022430
@SQ     SN:4    LN:191154276
@SQ     SN:5    LN:180915260
@SQ     SN:6    LN:171115067
@SQ     SN:7    LN:159138663
@SQ     SN:8    LN:146364022
@SQ     SN:9    LN:141213431
@SQ     SN:MT   LN:16569
@SQ     SN:X    LN:155270560
@SQ     SN:Y    LN:59373566
@PG     ID:TopHat       VN:2.0.8        CL:/disk2/mm/tophat --library-type fr-firststrand -p 14 -G /disk2/ab/RNAseq/GeneModel/Hs_ensembl_37.gtf -o /disk2/ab/RNA
seq1/Alignment/Human_brain /disk2/ab/RNAseq/Human_genome/Ensembl/GRCh37/Bowtie2Index/genome /disk2/ab/TrimmedData/Human_brain_trimmed.fastq
HWI-ST330:269:D16WHACXX:1:1208:4303:64472       0       1       10563   3       42M     *       0       0       CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG      @@@FFDDDBHHBFGEEEHE8@@GGB@D@D
H7@AF@FHBG@CE      AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:42 YT:Z:UU NH:i:2  CC:Z:15 CP:i:102520480  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:1104:14490:56799      0       1       10568   3       37M     *       0       0       CTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG   CCCDFFFFHHHHGJ@FHIIJJJJII?FHIFGGIJIII
   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:37 YT:Z:UU NH:i:2  CC:Z:15 CP:i:102520480  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:1207:13391:25296      256     1       10568   3       37M     *       0       0       CTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG   CC@FFFFFHHHHHJIIJJJIJJJJJFHGIEHIIJJJJ
   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:37 YT:Z:UU NH:i:2  CC:Z:15 CP:i:102520480  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:2203:2048:27099       256     1       10568   3       37M     *       0       0       CTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG   CCCFFFFFHHGHHJHIGIIIJIIJIHIHIIIIIIEIJ
   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:37 YT:Z:UU NH:i:2  CC:Z:15 CP:i:102520480  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:2312:5126:79980       256     1       11605   3       100M    *       0       0       CAGCAATGTCTAGGAGTGCCTGTTTCTCCACAAAGTGTTTACTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCT
GGAGATTCTTATTAGTGATTTGG    BBCFFFFFHHHHHJJ9CGGIIJIJJJJJJIGIJGI?EDHGIJIJJJJIIHIIJJJJJHIHHHIJJGJIFHIJJJGEHHHEFDCEEEECEEEECDEFEEDD    AS:i:-4 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:15A84
      YT:Z:UU NH:i:2  CC:Z:15 CP:i:102519466  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:1206:7020:40556       272     1       11695   3       100M    *       0       0       AGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGA
GAATGACTGCGCAAATTTGCCGG    CC:CC???<@?B?>7@BBBCC8C>:>>C>C?@96A?B@EDEE=)3==CGGHA@GGIGHFB8;=?:@?0:18D?@IG>HHFA?;BF8?8BFBFDD<DD@?@    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:100
        YT:Z:UU NH:i:2  CC:Z:15 CP:i:102519376  XS:A:+  HI:i:0
HWI-ST330:269:D16WHACXX:1:1208:6372:22787       256     1       11706   1       100M    *       0       0       CTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCG
CAAATTTGCCGGATTTCCTTTGC    @@CFDFFFAFFFHGEGHHAFCHICGIGGJEGHJJJEGGEEHIHIIEHGIIDCFHBHIEH8-9'5;@C?AACCC;;=8?B=BCDDDCDDD@BBCCCDDCCD    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:100
        YT:Z:UU NH:i:3  CC:Z:15 CP:i:102519365  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:2310:5378:15201       256     1       11706   1       100M    *       0       0       CTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCG
CAAATTTGCCGGATTTCCTTTGC    CCCFFFFFHGHGHJJIGHFHHIJJJJJIJIJJJJJIJJJIIIJJJJIJJGJJ>FGGJIIHBDDBA>CDCA>;@@5;,3=B>CD@:C:<95@BCCACCA3:    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:100
        YT:Z:UU NH:i:3  CC:Z:15 CP:i:102519365  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:2315:14323:51391      256     1       11706   1       100M    *       0       0       CTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCG
sam • 7.2k views
ADD COMMENTlink modified 5.4 years ago by Prakki Rama2.3k • written 5.4 years ago by M K460
4

start with the sam spec: http://samtools.github.io/hts-specs/SAMv1.pdf

ADD REPLYlink written 5.4 years ago by Pierre Lindenbaum122k
5
gravatar for eddie.im
5.4 years ago by
eddie.im130
Brazil
eddie.im130 wrote:

Have you read the documentation about SAM files? There is a full description of what kind of information you will find inside a bam file. http://samtools.github.io/hts-specs/SAMv1.pdf

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by eddie.im130
0
gravatar for Prakki Rama
5.4 years ago by
Prakki Rama2.3k
Singapore
Prakki Rama2.3k wrote:

I find this tutorial from UC Davis bioinformatics core useful, regarding SAM file explanations and alignment related information.

ADD COMMENTlink written 5.4 years ago by Prakki Rama2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1716 users visited in the last hour