Question: A constant problem with SAM file header
gravatar for popescuiofelia
2.3 years ago by
London, UK
popescuiofelia10 wrote:

Hi! I have mapped my RNA-seq reads on reference genome by bowtie2 after trimming Illumina adapters by Trimmoatic as below,

/bowtie2$ bowtie2 -p 4 -x Dicty_primary_genomic -U trialoutput.fq -S trialoutput.sam 

6619872 reads; of these:
  6619872 (100.00%) were unpaired; of these:
    303213 (4.58%) aligned 0 times
    2426695 (36.66%) aligned exactly 1 time
    3889964 (58.76%) aligned >1 times
95.42% overall alignment rate

The header of my SAM file (I have sorted SAM) looks so;

/samtools-1.9$ head trialoutput.sorted.sam
@HD VN:1.0  SO:coordinate
@SQ SN:DDB0216524|DDB_G0267364  LN:2441
@SQ SN:DDB0216528|DDB_G0267372  LN:2644
@SQ SN:DDB0191165|DDB_G0267380  LN:3559
@SQ SN:DDB0216498|DDB_G0267304  LN:4580
@SQ SN:DDB0202373|DDB_G0267338  LN:2486
@SQ SN:DDB0216520|DDB_G0267356  LN:6002
@SQ SN:DDB0307434|DDB_G0269812  LN:2651
@SQ SN:DDB0266571|DDB_G0269818  LN:2362
@SQ SN:DDB0302581|DDB_G0267990  LN:2561

As you are considering, pretty different with bowtie2 tutorial on example data as below;

@HD VN:1.0  SO:unsorted
@SQ SN:gi|9626243|ref|NC_001416.1|  LN:48502
@PG ID:bowtie2  PN:bowtie2  VN:2.0.1
r1  0   gi|9626243|ref|NC_001416.1| 18401   42  122M    *   0   0   TGAATGCGAACTCCGGGACGCTCAGTAATGTGCG  +"@6<:27(F&5)9)"B:%B+A-%5A?2$HCB0B+0=D<7E/<.03#!.F77@6B==?C"7>;))%;,3-$.A06+<-1/@@?,26">=?*@'0;$:;??G+:#+(A?9+10!8!?()?7C>  AS:i:-5 XN:i:0  XM:i:3  XO:i:0  XG:i:0  NM:i:3  MD:Z:59G13G21G26    YT:Z:UU
r2  0   gi|9626243|ref|NC_001416.1| 8886    42  275M    *   0   0   NTTNTGATGCGGGCTTGTGGAGTTCAG

I just wondered if this means that my SAM file is empty or any thing going wrong with my mapping?

Thanks a lot for any help

sam bowtie2 software error • 1.4k views
ADD COMMENTlink modified 2.3 years ago by h.mon32k • written 2.3 years ago by popescuiofelia10

I'm not seeing the glaring problem. Your reference has multiple sequences, theirs has one. Your have repetitive elements; most of your reads map equally well to multiple places, but if that's your reference, then that's your reference. But your mapping hardly failed.

ADD REPLYlink written 2.3 years ago by swbarnes29.4k

You should scroll further down in your own file. At some point you should start seeing the reads. head is only showing you top 10 lines.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by GenoMax95k
gravatar for h.mon
2.3 years ago by
h.mon32k wrote:

I wonder what you consider what you consider "pretty different". The only difference I can see is the bowtie2 tutorial example has a short header section, as the reference consists of only one contig - the reference is the complete genome of Enterobacteria phage lambda. Your reference apparently consists of tens or hundreds of short contigs, so it is quite long and head doesn't show any mapped reads. To see the first few mapped reads:

samtools view trialoutput.sorted.sam | head

This won't show header lines, just the mapping section. See also samtools flagstat and samtools stats to see mapping statistics.

ADD COMMENTlink written 2.3 years ago by h.mon32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1090 users visited in the last hour