Question: Need help using Shrimp2 on paired end color-space SOLiD data.
0
gravatar for Jordan
5.4 years ago by
Jordan1.1k
Pittsburgh
Jordan1.1k wrote:

Hi,

I have SOLiD reads which are paried-end (75bp and 35bp) in .csfasta and .QV.qual format. I would like to use Shrimp2 to align them. So far I have been having trouble using it.

I used the following command:

gmapper -1 Sample/F3/reads/Hope_2014_02_20_1_01_13_0502_F3.csfasta -2 Sample/F5-DNA/reads/Hope_2014_02_20_1_01_13_0502_F5-DNA.csfasta $SCRATCH/human_hg19.fa -N 32 -p opp-in  > Sample.sam 2> Logs/Sample.log

This is my log file and the error is shown at the bottom. I'm not sure what that means.

- Processing genome file [/Refs/human_hg19.fa]
- Processing contig chr1
- Processing contig chr2
- Processing contig chr3
- Processing contig chr4
- Processing contig chr5
- Processing contig chr6
- Processing contig chr7
- Processing contig chr8
- Processing contig chr9
- Processing contig chr10
- Processing contig chr11
- Processing contig chr12
- Processing contig chr13
- Processing contig chr14
- Processing contig chr15
- Processing contig chr16
- Processing contig chr17
- Processing contig chr18
- Processing contig chr19
- Processing contig chr20
- Processing contig chr21
- Processing contig chr22
- Processing contig chrX
- Processing contig chrY
- Processing contig chrM

Loaded Genome
note: detected fastq format in input file [Sample/F3/reads/Hope_2014_02_20_1_01_13_0502_F3.csfasta]
- Processing read files [Sample/F3/reads/Hope_2014_02_20_1_01_13_0502_F3.csfasta , Sample/F5-DNA/reads/Hope_2014_02_20_1_01_13_0502_F5-DNA.csfasta]

note: quality value format not set explicitly; using PHRED+64
done r/hr r/core-hr
error: realloc failed: Success

Here is how my csfasta files look like:

@1_2_53_F3
T02.2031.2212.12.3.12.2.03.1030.3.10.3.1313.2323.3211.3102.1001..321..023..1
@1_2_193_F3
T12.0303.2132.00.3.10.2.21.1330.0.30.2.2220.0020.1002.0000..332..302..012..3
@1_2_264_F3
T31.1220.2112.30.0.20.1.12.3032.3.01.2.1132.1310.2100.1211.3302..310..202..1
@1_2_468_F3
T31.3221.1202.02.1.31.3.02.2000.0.20.0.2020.0022.2223.2222.2222..203..220..3

 

And this is how my qual file looks like:

>1_2_53_F3
23 31 -1 30 27 27 30 -1 31 26 27 26 -1 26 14 -1 23 -1 21 29 -1 17 -1 14 17 -1 17 17 14 14 -1 17 -1 14 14 -1 23 -1 29 21 17 14 -1 31 26 14 12 -1 14 14 23 14 -1 14 21 14 17 -1 21 14 17 17 -1 -1 14 14 26 -1 -1 14 14 29 -1 -1 14 
>1_2_193_F3
31 17 -1 14 23 30 31 -1 31 23 31 31 -1 14 31 -1 31 -1 14 14 -1 14 -1 29 17 -1 31 14 23 17 -1 31 -1 17 14 -1 27 -1 13 21 14 17 -1 17 24 12 30 -1 21 31 23 21 -1 23 14 31 31 -1 -1 21 23 17 -1 -1 14 31 17 -1 -1 9 9 17 -1 -1 14 
>1_2_264_F3
31 31 -1 31 31 31 31 -1 31 27 31 31 -1 31 31 -1 31 -1 31 26 -1 21 -1 30 27 -1 31 26 31 31 -1 31 -1 31 30 -1 31 -1 21 31 31 28 -1 31 31 31 23 -1 26 17 23 31 -1 17 20 30 27 -1 26 28 31 30 -1 -1 21 21 13 -1 -1 13 27 31 -1 -1 26 
>1_2_468_F3
31 31 -1 31 31 31 31 -1 31 31 21 31 -1 14 28 -1 28 -1 29 30 -1 27 -1 21 31 -1 13 31 31 25 -1 12 -1 23 30 -1 28 -1 26 32 12 21 -1 28 18 30 12 -1 28 31 27 15 -1 15 31 28 14 -1 31 26 26 28 -1 -1 23 23 14 -1 -1 23 12 21 -1 -1 31 

Does anyone what the error is? I have never used Shrimp2 before, so struggling a bit.

Thanks for the help.

 

mapping shrimp2 solid paired-end • 2.0k views
ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Jordan1.1k

is it possible that the tool ran out of memory to allocate? If I recall it correctly it was quite the memory hog.

ADD REPLYlink written 5.4 years ago by Istvan Albert ♦♦ 81k

I'm not so sure. Each csfasta files is only about 5GB. And I gave a RAM of about 256GB and 32 cores to run this. Do you think it might need more RAM than that?

ADD REPLYlink written 5.4 years ago by Jordan1.1k
1
gravatar for Jordan
5.4 years ago by
Jordan1.1k
Pittsburgh
Jordan1.1k wrote:

Ok. I think I figured out the issue. Shrimp2 has separate aligners for line space and colorspace. 

For colorspace, which is the data I have, I should use gmapper-cs

So my command now is actually:

gmapper-cs -1 Sample/F3/reads/Hope_2014_02_20_1_01_13_0502_F3.csfasta -2 Sample/F5-DNA/reads/Hope_2014_02_20_1_01_13_0502_F5-DNA.csfasta $SCRATCH/human_hg19.fa -N 32 -p opp-in  > Sample.sam 2> Logs/Sample.log

This seems to work.

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Jordan1.1k

Interesting, I thought the @ made the mapper work incorrectly. But right, you have to use it in color space mode. Though it is still strange that you have the @ symbol there. 

ADD REPLYlink written 5.4 years ago by Istvan Albert ♦♦ 81k

I actually converted the @ to '>'. The other samples I have with me have '>', not '@'. 

Even after the conversion, it did not work. So I looked at the examples they gave again and realized that for csfasta they used gmapper-cs. And voila it started working. I wish they had better documentation.

Though I'm not sure how this particular sample changed to @.

ADD REPLYlink written 5.4 years ago by Jordan1.1k
1
gravatar for Istvan Albert
5.4 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

if you actually have 256GB of ram that that should not be an issue. 

Wait I think I see the problem, why do the records in your csfasta file start with @ symbols? That does not seem right. They should be >

 

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Istvan Albert ♦♦ 81k

I think your csfasta was turned into a csfastq at some point, then it was converted back to csfasta ... kind of crazy ...

ADD REPLYlink written 5.4 years ago by Istvan Albert ♦♦ 81k

Let me correct that. It's a bit weird how the csfasta format converted to csfastq. 

I will re-run it and see what happens. Thanks!

ADD REPLYlink written 5.4 years ago by Jordan1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 997 users visited in the last hour