Question: A constant problem with SAM file header
gravatar for popescuiofelia
5 months ago by
London, UK
popescuiofelia10 wrote:

Hi! I have mapped my RNA-seq reads on reference genome by bowtie2 after trimming Illumina adapters by Trimmoatic as below,

/bowtie2$ bowtie2 -p 4 -x Dicty_primary_genomic -U trialoutput.fq -S trialoutput.sam 

6619872 reads; of these:
  6619872 (100.00%) were unpaired; of these:
    303213 (4.58%) aligned 0 times
    2426695 (36.66%) aligned exactly 1 time
    3889964 (58.76%) aligned >1 times
95.42% overall alignment rate

The header of my SAM file (I have sorted SAM) looks so;

/samtools-1.9$ head trialoutput.sorted.sam
@HD VN:1.0  SO:coordinate
@SQ SN:DDB0216524|DDB_G0267364  LN:2441
@SQ SN:DDB0216528|DDB_G0267372  LN:2644
@SQ SN:DDB0191165|DDB_G0267380  LN:3559
@SQ SN:DDB0216498|DDB_G0267304  LN:4580
@SQ SN:DDB0202373|DDB_G0267338  LN:2486
@SQ SN:DDB0216520|DDB_G0267356  LN:6002
@SQ SN:DDB0307434|DDB_G0269812  LN:2651
@SQ SN:DDB0266571|DDB_G0269818  LN:2362
@SQ SN:DDB0302581|DDB_G0267990  LN:2561

As you are considering, pretty different with bowtie2 tutorial on example data as below;

@HD VN:1.0  SO:unsorted
@SQ SN:gi|9626243|ref|NC_001416.1|  LN:48502
@PG ID:bowtie2  PN:bowtie2  VN:2.0.1
r1  0   gi|9626243|ref|NC_001416.1| 18401   42  122M    *   0   0   TGAATGCGAACTCCGGGACGCTCAGTAATGTGCG  +"@6<:27(F&5)9)"B:%B+A-%5A?2$HCB0B+0=D<7E/<.03#!.F77@6B==?C"7>;))%;,3-$.A06+<-1/@@?,26">=?*@'0;$:;??G+:#+(A?9+10!8!?()?7C>  AS:i:-5 XN:i:0  XM:i:3  XO:i:0  XG:i:0  NM:i:3  MD:Z:59G13G21G26    YT:Z:UU
r2  0   gi|9626243|ref|NC_001416.1| 8886    42  275M    *   0   0   NTTNTGATGCGGGCTTGTGGAGTTCAG

I just wondered if this means that my SAM file is empty or any thing going wrong with my mapping?

Thanks a lot for any help

sam bowtie2 software error • 329 views
ADD COMMENTlink modified 5 months ago by h.mon24k • written 5 months ago by popescuiofelia10

I'm not seeing the glaring problem. Your reference has multiple sequences, theirs has one. Your have repetitive elements; most of your reads map equally well to multiple places, but if that's your reference, then that's your reference. But your mapping hardly failed.

ADD REPLYlink written 5 months ago by swbarnes25.0k

You should scroll further down in your own file. At some point you should start seeing the reads. head is only showing you top 10 lines.

ADD REPLYlink modified 5 months ago • written 5 months ago by genomax64k
gravatar for h.mon
5 months ago by
h.mon24k wrote:

I wonder what you consider what you consider "pretty different". The only difference I can see is the bowtie2 tutorial example has a short header section, as the reference consists of only one contig - the reference is the complete genome of Enterobacteria phage lambda. Your reference apparently consists of tens or hundreds of short contigs, so it is quite long and head doesn't show any mapped reads. To see the first few mapped reads:

samtools view trialoutput.sorted.sam | head

This won't show header lines, just the mapping section. See also samtools flagstat and samtools stats to see mapping statistics.

ADD COMMENTlink written 5 months ago by h.mon24k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 939 users visited in the last hour