SHRiMP output sam with unavaliable quality
1
0
Entering edit mode
7.4 years ago

Hey there,

I have a csfasta and .qual files that i mapped with shrimp but there is something strange with the output sam. All the reads in the mapped sam has no quality in the quality field. (i.e. *).

Here is the command I used:

/storage/bioinf/mappers/SHRiMP_2_2_3/bin/gmapper-cs -N 4 lane4_1_trimmed_F3.csfasta /home/center/ref/ref.fasta > lane4_1_F3.sam


This is a sample of csfasta file:

>1_14_257_F3
T2222212001232001201322222121200001122.1301.200
>1_14_358_F3
T0010010013231132111311322102332231003.1222.1302..132..1313.0013.3331.0131
>1_14_896_F3
T3321103113132001333022321112320001212.1002.3020..331..3220.2012


This is a sample of the corresponding .qual file:

>1_14_257_F3
31 27 31 31 31 31 31 31 31 31 31 31 31 17 28 31 31 31 23 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 17 31 31 -1 26 31 28 31 -1 31 31 31
>1_14_358_F3
31 31 31 31 31 31 31 31 27 29 31 31 31 26 31 31 31 31 29 28 31 31 31 31 31 31 31 31 31 31 31 31 31 27 31 31 31 -1 31 31 31 31 -1 30 21 31 31 -1 -1 31 31 31 -1 -1 31 31 31 31 -1 27 31 31 31 -1 31 31 31 31 -1 21 31 31 31
>1_14_896_F3
14 30 14 14 14 27 21 31 14 17 17 14 14 23 14 14 14 31 19 27 14 17 17 21 29 14 14 14 31 14 14 14 17 14 14 31 14 -1 14 14 26 26 -1 14 14 14 17 -1 -1 26 21 28 -1 -1 14 27 31 14 -1 31 30 14 31


This is a sample of output sam file:

1_14_257_F3     0       gi|514323296|gb|KE335191.1|     29      0       46M     *       0       0       CTCTCAGGGTCGAAACTTGCTCTCTGACTTTTTGTCTGTAACTCCC  *       AS:i:442        Z0:i:4167       Z1:i:3067       NM:i:0  CS:Z:T2222212001232001201322222121200001122.1301.200    CM:i:2  XX:Z:CTCTCAGGGTCGAAACTTGCTCTCTGACTTTTTGTCTgTAACtCCC
1_14_257_F3     16      gi|514322460|gb|KE336027.1|     2017    0       46M     *       0       0       GGGAGTTACAGACAAAAAGTCAGAGAGCAAGTTTCGACCCTGAGAG  *       AS:i:442        Z0:i:4167       Z1:i:3067       NM:i:0  CS:Z:T2222212001232001201322222121200001122.1301.200    CM:i:2  XX:Z:C
1_14_358_F3     0       gi|514325872|gb|KE332615.1|     607661  0       73M     *       0       0       TTGGGTTTGCTACATCACATGTAGACCTATCTACCCGACTCTCATTCACATCACATGCAAACGGCGCATTGCA       *       AS:i:601        Z0:i:29694      Z1:i:27391      NM:i:2  CS:Z:T0010010013231132111311322102332231003.1222.1302..132..1313.0013.3331.0131 CM:i:8  XX:Z:TTGGGTTTGCTACATCACATGTAGACCTATCTACCCGaCTCTcATTCacATCacATGCaAACGGCGCAtTGCA


This is the 1st time I use shrimp so can anyone tell me if there is something wrong with command? or the csfasta file or something?

HS

color shrimp mapping quality gmapper • 2.1k views
1
Entering edit mode
7.4 years ago

It has been a while since I used SHRimP2 but if I do remember correctly, SHRiMP2 will only accept csfasta file and not .qual file. Any attempt to provide the location of .qual file will result into error. In short, SHRimP2 can accept csfasta file and will not make use of .qual file and as a result you are getting * instead of quality string in sam output.

The solution for this problem is to convert csfasta and qual file into colorspace fastq format (csfastq). Note: I am talking about csfastq file that still contains colorspace reads like T1232001201322222. This is unlike other csfasta to fastq lossy conversions that attempt to decode color space into nucleotide space. ShRiMP2 can then align these csfastq file and will give quality string in SAM output.

1. You can use a script that comes with BFAST to convert csfasta, qual to csfastq.
2. If your files are in .xsq format. You can directly convert them to csfastq using https://pythonhosted.org/ngs_plumbing/xsq.html