Question: How to Separate a SAM File Containing Forward and Reverse Reads
0
gravatar for Ric
5 months ago by
Ric220
Australia
Ric220 wrote:

I got 0 plus and minus reads by aligning single reads against a gene in the following way:

> bwa index ref.fasta
> bwa mem ref.fasta sample5-21.fq | gzip -3 > sample5-21.sam.gz
> splitsam.sh sample5-21.sam.gz forward.sam.gz reverse.sam.gz
Total reads:    1792444
Plus reads:     0
Minus reads:    0
Unmapped reads: 1792444
Time:           1.166 seconds.

What did I miss?

rna-seq alignment gene • 201 views
ADD COMMENTlink written 5 months ago by Ric220

Hello Ric ,

it is unusual to gzip sam files. Instead you should convert it to sorted bam files using samtools sort -o output.bam input.sam.

But as splitsam.sh doesn't complain about the format, this is not your problem here.

Please convert to sorted bam and post the output of:

$ samtools flagstat output.bam

fin swimmer

ADD REPLYlink modified 5 months ago • written 5 months ago by finswimmer11k

Did you check the results of bwa mem? According to splitsam.sh, there aren't maped reads in your sam file. Try:

zcat sample5-21.sam.gz | samtools view | head

and:

zcat sample5-21.sam.gz | samtools view | tail

Indeed it is really strange to use gzipped sam files, but this is allowed by splitsam.sh (part of BBTools) - and apparently, splitsam.sh doesn't take bam as input. But I wouldn't gzip sam files, as very few tools will accept gzipped sam.

edit: take note you are also using splitsam.sh incorrectly, as it expects at least four arguments:

Usage:        splitsam <input> <plus output> <minus output> <unmapped output>

Input may be stdin or a sam file, raw or gzipped.
Outputs must be sam files, and may be gzipped.
  
ADD REPLYlink modified 5 months ago • written 5 months ago by h.mon24k
samtools flagstat sample5-21.sorted.bam
1792444 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (0.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
ADD REPLYlink written 5 months ago by Ric220
> samtools view sample5-21.sorted.bam | head
NS500334:143:H3GTHBGX9:4:11401:23521:1002   4   *   0   0   *   *   0   0   NCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:16217:1008   4   *   0   0   *   *   0   0   NCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:9757:1012    4   *   0   0   *   *   0   0   NCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:16357:1016   4   *   0   0   *   *   0   0   TATATGTTCTCAGGTCGCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:15714:1027   4   *   0   0   *   *   0   0   TTCGGACCAGGCTTCATTCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:2589:1037    4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:8137:1041    4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:21304:1044   4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:17205:1078   4   *   0   0   *   *   0   0   GTGTACGCGTCAGCTGCTGCT   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:19354:1079   4   *   0   0   *   *   0   0   AACAGACCGGTAGACTTGAAC   *   AS:i:0  XS:i:0


> samtools view sample5-21.sorted.bam | tail
NS500334:143:H3GTHBGX9:1:23312:10665:20320  4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:7465:20341   4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:2947:20342   4   *   0   0   *   *   0   0   TCCGGGCTTTACTTGTACAGC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:22490:20344  4   *   0   0   *   *   0   0   ACCGCATCGAGCTGAAGGGCA   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:24199:20360  4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:22936:20364  4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:13141:20379  4   *   0   0   *   *   0   0   GGAGACTCGAACTCACAACCT   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:2345:20381   4   *   0   0   *   *   0   0   CACCTACGGCAAGCTGACCCT   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:17238:20382  4   *   0   0   *   *   0   0   CCCAAAGGAGAAGCTCAACTC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:15492:20404  4   *   0   0   *   *   0   0   CGGAGATTACAATGGACGATT   *   AS:i:0  XS:i:0
ADD REPLYlink written 5 months ago by Ric220

The reference is 5044 bp long and the reads are 21 bp long. Should I use a different aliger and if yes then which one?

ADD REPLYlink written 5 months ago by Ric220

Yes, bwa mem is intended to reads 70 base pairs or longer, you should use another aligner. I would recommend Bowtie (preferred) or Bowtie2 (you may have to tweak the settings).

Could you please report back if this improves alignment? Thanks.

ADD REPLYlink modified 5 months ago • written 5 months ago by h.mon24k

I tried bowtie2 but I ran into this problem bowtie2 can't find index

ADD REPLYlink written 5 months ago by Ric220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1028 users visited in the last hour