GAA ; problem combining two assemblies with Filter.pm and Validator.pm warnings and a large query contig warning.
2
0
Entering edit mode
8.2 years ago

Hello,

I am trying to combine two assemblies using GAA (https://github.com/ghyao/GAA) but I'm getting some warnings with the final one preceding the premature termination of GAA with no useful output. I'm going to list the warnings here and if anyone has a solution it would be very much appreciated.

1. " defined(%hash) is deprecated at /usr/local/gaa/GAA/Filter.pm line 282. (Maybe you should just omit the defined()?) "

2. (So far I only get this one with queries which have large contigs )" Maximum single piece size (5000) exceeded by query 392 of size (6565). Larger pieces will have to be split up until no larger than this limit when the -fastMap option is used. "

3. "Set Scaf Degree and Contig Length 14:33:49 Modification of non-creatable array value attempted, subscript -1 at /usr/local/gaa/GAA/Validator.pm line 716, <IN> line 1."

The Filter.pm line is here: https://github.com/ghyao/GAA/blob/master/GAA/Filter.pm#L282

and the Validator.pm line is here: https://github.com/ghyao/GAA/blob/master/GAA/Validator.pm#L716

I thought GAA was able to deal with large assemblies, does anyone know how to get round this? Do I really have to break up large contigs and if so is there a recommended method/script?

Anyway here is an example of my command and all the messages I get, please do send any solutions or suggestions you may have.

gaa.pl --target ./file1.fa --query ./file2.fa -o file2_file3_combined
defined(%hash) is deprecated at /usr/local/gaa/GAA/Filter.pm line 282.
    (Maybe you should just omit the defined()?)

Mon Feb  1 14:49:59 EET 2016
>>> GAA <<<

Mon Feb  1 14:49:59 EET 2016
blat -fastMap /home/path1/file1.fa /home/pat2h/file2.fa match.psl
Loaded 19023287 letters in 193063 sequences
Searched 16715140 bases in 193454 sequences
>>>
>>> 1. Simplifier
>>>
Read match:    match.psl
14:51:59
Get unique match => 1match.unique
14:54:25
>>> Done Simplifier <<<
14:54:25
>>>
>>> 2. Filter
>>>
Query Match:    1match.unique
14:54:25
Target Relation String
14:54:25
Query Relation String
14:54:25
Get q intervals
14:54:25
Tracing
14:54:25
# track before merge:    1
14:54:25
Linking
14:54:25
# links:        
# track after merge:    1
14:54:25
>>> Done Filter <<<
14:54:25
>>>
>>> Validator
>>>
Set Scaf Degree and Contig Length
14:54:25
Modification of non-creatable array value attempted, subscript -1 at /usr/local/gaa/GAA/Validator.pm line 716, <IN> line 1.

--Diane

combine GAA problem assembly perl • 1.8k views
ADD COMMENT
0
Entering edit mode
8.2 years ago

I eventually found somewhere that if you rename all the contigs as Contig0.1 Contig0.2 in both assemblies it does work for smallish assemblies (I used awk '/^>/{print ">Contig0." ++i;next}{print}' <file.fasta > newfile.fasta to rename them) but I am still having issues with using assemblies of >5kb as is. Any suggestions?

ADD COMMENT
0
Entering edit mode
8.2 years ago

In case anyone else finds this useful I used blat beforehand and with a supplied psl alignment file and renamed contigs it finally works properly and well. I highly recommend using a multithread blat version (http://icebert.github.io/pblat/) as blat takes forever with large files on one thread!!!

ADD COMMENT

Login before adding your answer.

Traffic: 2498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6