Question: How to Separate a SAM File Containing Forward and Reverse Reads
0
gravatar for Ric
16 months ago by
Ric290
Australia
Ric290 wrote:

I got 0 plus and minus reads by aligning single reads against a gene in the following way:

> bwa index ref.fasta
> bwa mem ref.fasta sample5-21.fq | gzip -3 > sample5-21.sam.gz
> splitsam.sh sample5-21.sam.gz forward.sam.gz reverse.sam.gz
Total reads:    1792444
Plus reads:     0
Minus reads:    0
Unmapped reads: 1792444
Time:           1.166 seconds.

What did I miss?

rna-seq alignment gene • 395 views
ADD COMMENTlink written 16 months ago by Ric290

Hello Ric ,

it is unusual to gzip sam files. Instead you should convert it to sorted bam files using samtools sort -o output.bam input.sam.

But as splitsam.sh doesn't complain about the format, this is not your problem here.

Please convert to sorted bam and post the output of:

$ samtools flagstat output.bam

fin swimmer

ADD REPLYlink modified 16 months ago • written 16 months ago by finswimmer13k

Did you check the results of bwa mem? According to splitsam.sh, there aren't maped reads in your sam file. Try:

zcat sample5-21.sam.gz | samtools view | head

and:

zcat sample5-21.sam.gz | samtools view | tail

Indeed it is really strange to use gzipped sam files, but this is allowed by splitsam.sh (part of BBTools) - and apparently, splitsam.sh doesn't take bam as input. But I wouldn't gzip sam files, as very few tools will accept gzipped sam.

edit: take note you are also using splitsam.sh incorrectly, as it expects at least four arguments:

Usage:        splitsam <input> <plus output> <minus output> <unmapped output>

Input may be stdin or a sam file, raw or gzipped.
Outputs must be sam files, and may be gzipped.
  
ADD REPLYlink modified 16 months ago • written 16 months ago by h.mon29k
samtools flagstat sample5-21.sorted.bam
1792444 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (0.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
ADD REPLYlink written 16 months ago by Ric290
> samtools view sample5-21.sorted.bam | head
NS500334:143:H3GTHBGX9:4:11401:23521:1002   4   *   0   0   *   *   0   0   NCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:16217:1008   4   *   0   0   *   *   0   0   NCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:9757:1012    4   *   0   0   *   *   0   0   NCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:16357:1016   4   *   0   0   *   *   0   0   TATATGTTCTCAGGTCGCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:15714:1027   4   *   0   0   *   *   0   0   TTCGGACCAGGCTTCATTCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:2589:1037    4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:8137:1041    4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:21304:1044   4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:17205:1078   4   *   0   0   *   *   0   0   GTGTACGCGTCAGCTGCTGCT   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:4:11401:19354:1079   4   *   0   0   *   *   0   0   AACAGACCGGTAGACTTGAAC   *   AS:i:0  XS:i:0


> samtools view sample5-21.sorted.bam | tail
NS500334:143:H3GTHBGX9:1:23312:10665:20320  4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:7465:20341   4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:2947:20342   4   *   0   0   *   *   0   0   TCCGGGCTTTACTTGTACAGC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:22490:20344  4   *   0   0   *   *   0   0   ACCGCATCGAGCTGAAGGGCA   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:24199:20360  4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:22936:20364  4   *   0   0   *   *   0   0   TCGGACCAGGCTTCATTCCCC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:13141:20379  4   *   0   0   *   *   0   0   GGAGACTCGAACTCACAACCT   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:2345:20381   4   *   0   0   *   *   0   0   CACCTACGGCAAGCTGACCCT   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:17238:20382  4   *   0   0   *   *   0   0   CCCAAAGGAGAAGCTCAACTC   *   AS:i:0  XS:i:0
NS500334:143:H3GTHBGX9:1:23312:15492:20404  4   *   0   0   *   *   0   0   CGGAGATTACAATGGACGATT   *   AS:i:0  XS:i:0
ADD REPLYlink written 16 months ago by Ric290

The reference is 5044 bp long and the reads are 21 bp long. Should I use a different aliger and if yes then which one?

ADD REPLYlink written 16 months ago by Ric290

Yes, bwa mem is intended to reads 70 base pairs or longer, you should use another aligner. I would recommend Bowtie (preferred) or Bowtie2 (you may have to tweak the settings).

Could you please report back if this improves alignment? Thanks.

ADD REPLYlink modified 16 months ago • written 16 months ago by h.mon29k

I tried bowtie2 but I ran into this problem bowtie2 can't find index

ADD REPLYlink written 16 months ago by Ric290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1103 users visited in the last hour