Could Any One Explain What Information Inside Sam File?
2
1
Entering edit mode
10.1 years ago
M K ▴ 660

I mapped my rna reads to human genome using Tophat, and I got the binary file accepted_hits.bam. After that I used samtools to convert it to .sam file, but when I looked to this .sam file I didn't got full understand of it. so can any one help me to understand what inside this file and what important information it. This is a small portion of this file:

@HD     VN:1.0  SO:coordinate
@SQ     SN:1    LN:249250621
@SQ     SN:10   LN:135534747
@SQ     SN:11   LN:135006516
@SQ     SN:12   LN:133851895
@SQ     SN:13   LN:115169878
@SQ     SN:14   LN:107349540
@SQ     SN:15   LN:102531392
@SQ     SN:16   LN:90354753
@SQ     SN:17   LN:81195210
@SQ     SN:18   LN:78077248
@SQ     SN:19   LN:59128983
@SQ     SN:2    LN:243199373
@SQ     SN:20   LN:63025520
@SQ     SN:21   LN:48129895
@SQ     SN:22   LN:51304566
@SQ     SN:3    LN:198022430
@SQ     SN:4    LN:191154276
@SQ     SN:5    LN:180915260
@SQ     SN:6    LN:171115067
@SQ     SN:7    LN:159138663
@SQ     SN:8    LN:146364022
@SQ     SN:9    LN:141213431
@SQ     SN:MT   LN:16569
@SQ     SN:X    LN:155270560
@SQ     SN:Y    LN:59373566
@PG     ID:TopHat       VN:2.0.8        CL:/disk2/mm/tophat --library-type fr-firststrand -p 14 -G /disk2/ab/RNAseq/GeneModel/Hs_ensembl_37.gtf -o /disk2/ab/RNA
seq1/Alignment/Human_brain /disk2/ab/RNAseq/Human_genome/Ensembl/GRCh37/Bowtie2Index/genome /disk2/ab/TrimmedData/Human_brain_trimmed.fastq
HWI-ST330:269:D16WHACXX:1:1208:4303:64472       0       1       10563   3       42M     *       0       0       CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG      @@@FFDDDBHHBFGEEEHE8@@GGB@D@D
H7@AF@FHBG@CE      AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:42 YT:Z:UU NH:i:2  CC:Z:15 CP:i:102520480  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:1104:14490:56799      0       1       10568   3       37M     *       0       0       CTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG   CCCDFFFFHHHHGJ@FHIIJJJJII?FHIFGGIJIII
   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:37 YT:Z:UU NH:i:2  CC:Z:15 CP:i:102520480  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:1207:13391:25296      256     1       10568   3       37M     *       0       0       CTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG   CC@FFFFFHHHHHJIIJJJIJJJJJFHGIEHIIJJJJ
   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:37 YT:Z:UU NH:i:2  CC:Z:15 CP:i:102520480  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:2203:2048:27099       256     1       10568   3       37M     *       0       0       CTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG   CCCFFFFFHHGHHJHIGIIIJIIJIHIHIIIIIIEIJ
   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:37 YT:Z:UU NH:i:2  CC:Z:15 CP:i:102520480  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:2312:5126:79980       256     1       11605   3       100M    *       0       0       CAGCAATGTCTAGGAGTGCCTGTTTCTCCACAAAGTGTTTACTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCT
GGAGATTCTTATTAGTGATTTGG    BBCFFFFFHHHHHJJ9CGGIIJIJJJJJJIGIJGI?EDHGIJIJJJJIIHIIJJJJJHIHHHIJJGJIFHIJJJGEHHHEFDCEEEECEEEECDEFEEDD    AS:i:-4 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:15A84
      YT:Z:UU NH:i:2  CC:Z:15 CP:i:102519466  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:1206:7020:40556       272     1       11695   3       100M    *       0       0       AGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGA
GAATGACTGCGCAAATTTGCCGG    CC:CC???<@?B?>7@BBBCC8C>:>>C>C?@96A?B@EDEE=)3==CGGHA@GGIGHFB8;=?:@?0:18D?@IG>HHFA?;BF8?8BFBFDD<DD@?@    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:100
        YT:Z:UU NH:i:2  CC:Z:15 CP:i:102519376  XS:A:+  HI:i:0
HWI-ST330:269:D16WHACXX:1:1208:6372:22787       256     1       11706   1       100M    *       0       0       CTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCG
CAAATTTGCCGGATTTCCTTTGC    @@CFDFFFAFFFHGEGHHAFCHICGIGGJEGHJJJEGGEEHIHIIEHGIIDCFHBHIEH8-9'5;@C?AACCC;;=8?B=BCDDDCDDD@BBCCCDDCCD    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:100
        YT:Z:UU NH:i:3  CC:Z:15 CP:i:102519365  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:2310:5378:15201       256     1       11706   1       100M    *       0       0       CTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCG
CAAATTTGCCGGATTTCCTTTGC    CCCFFFFFHGHGHJJIGHFHHIJJJJJIJIJJJJJIJJJIIIJJJJIJJGJJ>FGGJIIHBDDBA>CDCA>;@@5;,3=B>CD@:C:<95@BCCACCA3:    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:100
        YT:Z:UU NH:i:3  CC:Z:15 CP:i:102519365  XS:A:-  HI:i:0
HWI-ST330:269:D16WHACXX:1:2315:14323:51391      256     1       11706   1       100M    *       0       0       CTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCG
sam • 11k views
ADD COMMENT
4
Entering edit mode
ADD REPLY
6
Entering edit mode
10.1 years ago
eddie.im ▴ 140

Have you read the documentation about SAM files? There is a full description of what kind of information you will find inside a bam file.

ADD COMMENT
0
Entering edit mode
10.1 years ago
Prakki Rama ★ 2.7k

I find this tutorial from UC Davis bioinformatics core useful, regarding SAM file explanations and alignment related information.

ADD COMMENT

Login before adding your answer.

Traffic: 2307 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6