Question: NaS (Nanopore Synthetic-long) help
0
gravatar for midox
4.4 years ago by
midox230
Tunisia
midox230 wrote:

hello,
I am trying to set up the program is NaS hybrid approach Developed to take advantage of data generated using Minion device.
I started to install the PRE-REQUISITES but the last two points seems a little fuzzy.

  • Blat binary (at least v35) available through your PATH variable environnment.
  • Last binary (at least 502) accessed through your PATH variable environnment.

do you have an idea how to do?
Thank you.

program preprocessing assembly • 2.3k views
ADD COMMENTlink modified 4.3 years ago by guillaume.gautreau44140 • written 4.4 years ago by midox230

hello,

I'm still working on the implementation of NaS but here's the new problem:

Number of parallel task : 5
[mar. juin 16 13:54:04 CEST 2015] Create output directory : NaS_example
[mar. juin 16 13:54:04 CEST 2015] Create fasta file from fastq...
[mar. juin 16 13:54:50 CEST 2015] Alignement step in fast mode...
[mar. juin 16 13:54:55 CEST 2015] Select reads...
[mar. juin 16 13:54:55 CEST 2015] Retrieve similar reads...
[mar. juin 16 13:54:55 CEST 2015] Generate NaS reads...
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; and it won't cost you a cent.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; and it won't cost you a cent.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

cat: NaS_example/assemblies/*/NaS_hqctg_reads_final.fa: Aucun fichier ou dossier de ce type
[mar. juin 16 13:54:56 CEST 2015] Generate statistics...

** WARNING **: Warning zero length sequence []
awk: (FILENAME=- FNR=1) Fatal: tentative de division par zéro
gawk: (FILENAME=- FNR=1) Fatal: tentative de division par zéro
NbReads=  5  CumulativeSize=  31008  N50size=  7994  minSize=  2512  maxSize=  10464  avgSize=  6201.6  =>  NaS_example/NANO_reads.stats
NbReads=  1  CumulativeSize=  0  N50size=    minSize=  maxSize=  maxSize=  0  avgSize=    =>  NaS_example/NaS_hqctg_reads.stats

do you know how to fix this despite the parallel module is installed?

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by midox230
1

hello,

This message from parallel come while you don't run parallel --bibtex, run it and the message will disappear. But these is just a message, parallel works, your main problem is still with blat.

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by guillaume.gautreau44140

hello,

thank you for your help, I have resolved the problem of parallel.

but I think as you said there is another problem I'm not that slows down the rendering of Blat blat because I have tested and it works.

Number of parallel task : 5

[mer. juin 17 09:42:51 CEST 2015] Create output directory : NaS_example
[mer. juin 17 09:42:51 CEST 2015] Create fasta file from fastq...
[mer. juin 17 09:43:35 CEST 2015] Alignement step in fast mode...
[mer. juin 17 09:43:47 CEST 2015] Select reads...
[mer. juin 17 09:43:47 CEST 2015] Retrieve similar reads...
[mer. juin 17 09:43:47 CEST 2015] Generate NaS reads...
cat: NaS_example/assemblies/*/NaS_hqctg_reads_final.fa: Aucun fichier ou dossier de ce type
[mer. juin 17 09:43:48 CEST 2015] Generate statistics...

** WARNING **: Warning zero length sequence []
awk: (FILENAME=- FNR=1) Fatal: tentative de division par zéro
gawk: (FILENAME=- FNR=1) Fatal: tentative de division par zéro
NbReads=  5  CumulativeSize=  31008  N50size=  7994  minSize=  2512  maxSize=  10464  avgSize=  6201.6  =>  NaS_example/NANO_reads.stats
NbReads=  1  CumulativeSize=  0  N50size=    minSize=  maxSize=  maxSize=  0  avgSize=    =>  NaS_example/NaS_hqctg_reads.stats

Here is my link to the NaS_example folder http://uptobox.com/csx3wu3gozu9

Is what you can help me on this?

thankyou

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by midox230
2
gravatar for guillaume.gautreau44
4.4 years ago by
France
guillaume.gautreau44140 wrote:

Hello,

fastalength and fastacomposition are libraries of EBI :

Those libraries are required by NaS, there are in NaS directory but it might not be compatible with the glib version of your OS. You can compile them by yourself with the source code available here (http://www.ebi.ac.uk/~guy/exonerate/)

When binaries are compiled, copy them (FastaToTbl, TblToFasta, fastacomposition, fastalength) into NaS directory et run NaS again.

ADD COMMENTlink written 4.4 years ago by guillaume.gautreau44140

I solved this problem and thank you for your help and I recompiled and here's the problem remains

root@midox-VGN-NS11S-S:/home/midox/Bureau/NaS-master# $(pwd)/NaS_v2/NaS --fq1 /home/midox/Bureau/NaS_example_acineto/AWK_DOSF_1_1_A5KR6.IND3_clean.10prc.fastq --fq2 /home/midox/Bureau/NaS_example_acineto/AWK_DOSF_1_2_A5KR6.IND3_clean.10prc.fastq --nano /home/midox/Bureau/NaS_example_acineto/MinION_reads_Acinetobacter_baylyi.fa --out NaS_example --nb_proc 5
Number of parallel task : 5
[lundi 8 juin 2015, 16:05:18 (UTC+0200)] Create output directory : NaS_example
[lundi 8 juin 2015, 16:05:18 (UTC+0200)] Create fasta file from fastq...
[lundi 8 juin 2015, 16:07:51 (UTC+0200)] Alignement step in fast mode...
[lundi 8 juin 2015, 16:07:58 (UTC+0200)] Select reads...
[lundi 8 juin 2015, 16:07:58 (UTC+0200)] Retrieve similar reads...
[lundi 8 juin 2015, 16:07:58 (UTC+0200)] Generate NaS reads...
cat: NaS_example/assemblies/*/NaS_hqctg_reads_final.fa: No such file or directory
[lundi 8 juin 2015, 16:08:00 (UTC+0200)] Generate statistics...

** (process:25124): WARNING **: Warning zero length sequence []
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
gawk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
NbReads=  5  CumulativeSize=  31008  N50size=  7994  minSize=  2512  maxSize=  10464  avgSize=  6201.6  =>  NaS_example/NANO_reads.stats
NbReads=  1  CumulativeSize=  0  N50size=    minSize=  maxSize=  maxSize=  0  avgSize=    =>  NaS_example/NaS_hqctg_reads.stats

I don't know what is this type of problem.

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.4 years ago by midox230
1

Can you send me your NaS_example directory, please ?

ADD REPLYlink written 4.4 years ago by guillaume.gautreau44140

here's a link to my NaS example (compressed).
http://uptobox.com/opp5tjpxxq35
thank you for your help

ADD REPLYlink written 4.4 years ago by midox230
1

In your file NaS_example/tmp/blat-alignement.stderr:

/bin/bash: line 3: blat: command not found

Are you sure you have BLAT accessible through your PATH ?

If you type blat in a terminal, what happens?

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.4 years ago by guillaume.gautreau44140

blat when I step on the command line it says "blat: command not found" even though I installed it.

I don't know the problem with Blat.

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.4 years ago by midox230
1

In the file NaS line 42:

PATH=/env/cns/opt/454-2.9/bin/:/env/cns/src/blat/blat_v35/bin/linux/:$PATH

Replace /env/cns/src/blat/blat_v35/bin/linux/ by the link to your blat binaries directory and try to run NaS again

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.4 years ago by guillaume.gautreau44140
1
gravatar for george.ry
4.4 years ago by
george.ry1.1k
United Kingdom
george.ry1.1k wrote:

LAST: http://last.cbrc.jp/

BLAT: http://hgdownload.soe.ucsc.edu/admin/exe/

 

Download both and then add the directories to your PATH - probably easiest set in your bashrc file (http://unix.stackexchange.com/questions/26047/how-to-correctly-add-a-path-to-path) - or add a link to the binaries themselves into /usr/local/bin/ .

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by george.ry1.1k
0
gravatar for guillaume.gautreau44
4.3 years ago by
France
guillaume.gautreau44140 wrote:

In file NaS_exemple/tmp/blat-alignment.stderr:

sh: -c: line 0: Erreur de syntaxe près du symbole inattendu « ( »
sh: -c: line 0: `blat -tileSize=10 -stepSize=5 -noHead /scratch/mkchouk/testdata/NaS_example_acineto/MinION_reads_Acinetobacter_baylyi.fa stdin >(cat) >&2'

To understand it, I want to reproduce this error on my test environment, so, what's your versions for:

linux ?
bash ? (run bash --version)
parallel ?
blat ?

Moreover when you use NaS, what happens if you add this option --mode sensitive?

ADD COMMENTlink modified 2 days ago by RamRS24k • written 4.3 years ago by guillaume.gautreau44140

I resolved the problem and NaS run successfully. thankyou Guillaume Gautreau44

this is the output:

Number of parallel task : 5
[mer. juin 17 15:05:00 CEST 2015] Create output directory : NaS_example
[mer. juin 17 15:05:00 CEST 2015] Create fasta file from fastq...
[mer. juin 17 15:05:40 CEST 2015] Alignement step in fast mode...
[mer. juin 17 15:05:44 CEST 2015] Select reads...
[mer. juin 17 15:05:44 CEST 2015] Retrieve similar reads...
[mer. juin 17 15:07:01 CEST 2015] Generate NaS reads...
mkdir: impossible de créer le répertoire « NaS_example/assemblies/channel_101_read_1_twodirections »: Le fichier existe
mkdir: impossible de créer le répertoire « NaS_example/assemblies/channel_103_read_6_twodirections »: Le fichier existe
mkdir: impossible de créer le répertoire « NaS_example/assemblies/channel_100_read_10_twodirections »: Le fichier existe
mkdir: impossible de créer le répertoire « NaS_example/assemblies/channel_102_read_15_twodirections »: Le fichier existe
mkdir: impossible de créer le répertoire « NaS_example/assemblies/channel_103_read_2_twodirections »: Le fichier existe
[mer. juin 17 15:07:11 CEST 2015] Generate statistics...
NbReads=  5  CumulativeSize=  31008  N50size=  7994  minSize=  2512  maxSize=  10464  avgSize=  6201.6  =>  NaS_example/NANO_reads.stats
NbReads=  5  CumulativeSize=  38036  N50size=  9743  minSize=  1982  maxSize=  12787  avgSize=  7607.2  =>  NaS_example/NaS_hqctg_reads.stats

but in the example in GitHub the output with the similar datasets

NbReads= 5 CumulativeSize= 31008 N50size= 7994 minSize= 2512 maxSize= 10464 avgSize= 6201.6 => /env/cns/home/ggautrea/NaS_example/NANO_reads.stats
NbReads= 4 CumulativeSize= 34867 N50size= 9707 minSize= 4263 maxSize= 11971 avgSize= 8716.75 => /env/cns/home/ggautrea/NaS_example/NaS_hqctg_reads.stats

in this way, is that Nas work properly or not? because we havent the similar outputs?

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by midox230

Have you removed previous NaS_exemple directory before run NaS ?

Maybe your version of newbler or blat is not the same as example

ADD REPLYlink written 4.3 years ago by guillaume.gautreau44140

yes, I removed previous NaS_example.

I use Nas with sensitive mode --mode sensitive he run successfully.

[mer. juin 17 15:07:11 CEST 2015] Generate statistics...
NbReads=  5  CumulativeSize=  31008  N50size=  7994  minSize=  2512  maxSize=  10464  avgSize=  6201.6  =>  NaS_example/NANO_reads.stats
NbReads=  5  CumulativeSize=  38036  N50size=  9743  minSize=  1982  maxSize=  12787  avgSize=  7607.2  =>  NaS_example/NaS_hqctg_reads.stats

(I don't know is that the good results)

but without the --mode sensitive and it is the same error :(

Number of parallel task : 5
[mer. juin 17 15:53:52 CEST 2015] Create output directory : NaS_example
[mer. juin 17 15:53:52 CEST 2015] Create fasta file from fastq...
[mer. juin 17 15:54:33 CEST 2015] Alignement step in fast mode...
[mer. juin 17 15:54:37 CEST 2015] Select reads...
[mer. juin 17 15:54:37 CEST 2015] Retrieve similar reads...
[mer. juin 17 15:54:37 CEST 2015] Generate NaS reads...
cat: NaS_example/assemblies/*/NaS_hqctg_reads_final.fa: Aucun fichier ou dossier de ce type
[mer. juin 17 15:54:37 CEST 2015] Generate statistics...

** WARNING **: Warning zero length sequence []
awk: (FILENAME=- FNR=1) Fatal: tentative de division par zéro
gawk: (FILENAME=- FNR=1) Fatal: tentative de division par zéro
NbReads=  5  CumulativeSize=  31008  N50size=  7994  minSize=  2512  maxSize=  10464  avgSize=  6201.6  =>  NaS_example/NANO_reads.stats
NbReads=  1  CumulativeSize=  0  N50size=    minSize=  maxSize=  maxSize=  0  avgSize=    =>  NaS_example/NaS_hqctg_reads.stats

in file NaS_exemple/tmp/blat-alignment.stderr:

sh: -c: line 0: Erreur de syntaxe près du symbole inattendu « ( »
sh: -c: line 0: `blat -tileSize=10 -stepSize=5 -noHead /scratch/mkchouk/testdata/NaS_example_acineto/MinION_reads_Acinetobacter_baylyi.fa stdin >(cat) >&2'

I have GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)<

I have blat/35 And parallel/20150522

I think the problem of NaS not resolved yet.

Thankyou

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by midox230

i think the problem is in script in NaS-wrapped in (cat)

ADD REPLYlink written 4.3 years ago by midox230
0
gravatar for guillaume.gautreau44
4.3 years ago by
France
guillaume.gautreau44140 wrote:

I use parallel/20130122 and it works but this problem with your version is not normal and will be corrected.

Can you send just the file NaS_hqctg_reads.fa in sentitive mode to check quality?

ADD COMMENTlink written 4.3 years ago by guillaume.gautreau44140

I tested the parallel and it works.

here's a link to NaS_hqctg_reads.fa in sentitive mode.

http://www47.zippyshare.com/v/tpnw6Uy5/file.html

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by midox230

NaS don't work in fast mode with this version of parallel and must be fixed

You can download here the reference assembly of Acinetobacter baylyi: http://www.genoscope.cns.fr/externe/nas/references/acineto/

Use bwa mem with option -x ont2d to compare acineto NaS reads with the reference

On your 5 reads corrected by NaS, here is the stat of alignment of corrected reads using the reference

Number of reads                          : 5
Number of reads (>10Kb)                  : 1
Number of bp                             : 38036
Average size of reads                    : 7607.2
N50 size of reads                        : 9743
Max size of reads                        : 12787
######
Number of aligned reads                  : 5 (100%)
Number of aligned bp                     : 38036 (100%)
Average identity percent                 : 100%
Max alignement size                      : 12787
Number of aligned reads L=100%           : 5 (100%)
Number of aligned reads ID=100%          : 5 (100%)
Number of aligned reads L=100% ; ID=100% : 5 (100%)
Number of loci                           : 5
Reference size                           : 3598621
Coverage of reference                    : 38041 (1.05%)
ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by guillaume.gautreau44140

So, I can't use NaS with the parallel module parallel/20150522?

and it works with the sensitive mode, I can use this mode of NaS for my tests?

ThankYou

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by midox230

For the sentitive mode, yes you can, you have better results than git hub example ;)

For the fast mode, it will be fixed soon

ADD REPLYlink written 4.3 years ago by guillaume.gautreau44140

ok thankyou Guillaume.

ADD REPLYlink written 4.3 years ago by midox230

Fast mode problem with parallel is fixed, update NaS ;)

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by guillaume.gautreau44140

thankyou Guillaume,

the NaS works very well but the results HERE:

Number of parallel task : 5
[jeu. juin 18 15:50:05 CEST 2015] Create output directory : NaS_example
[jeu. juin 18 15:50:05 CEST 2015] Create fasta file from fastq...
[jeu. juin 18 15:50:42 CEST 2015] Alignement step in fast mode...
[jeu. juin 18 15:50:49 CEST 2015] Select reads...
[jeu. juin 18 15:50:49 CEST 2015] Retrieve similar reads...
[jeu. juin 18 15:52:35 CEST 2015] Generate NaS reads...
[jeu. juin 18 15:52:39 CEST 2015] Generate statistics...
NbReads=  5  CumulativeSize=  31008  N50size=  7994  minSize=  2512  maxSize=  10464  avgSize=  6201.6  =>  NaS_example/NANO_reads.stats
NbReads=  4  CumulativeSize=  17401  N50size=  4786  minSize=  1573  maxSize=  8242  avgSize=  4350.25  =>  NaS_example/NaS_hqctg_reads.stats

they are not like the example is that normal?

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by midox230

do NaS is usable for large genomes ? because I want to use it for plant genomes .

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by midox230

I fixed a bug, several blat process worked on the same file in same time that why your stats are different than example. Now, you can update NaS again :)

NaS has been tested on genome until ~20Mb like yeast. It may work on little plant genome like arabidopsis thaliana but not on large genome with a lot of repeat. Genoscope currently working on the improvement of NaS to deal with larger genome.

I advise you to split your fasta dataset of nanopore reads in portion of ~20mb rather to provide all the data to NaS in one time.

ADD REPLYlink written 4.3 years ago by guillaume.gautreau44140

Hello,

Do NaS works with PacBio data?

Thankyou

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by midox230
1

Yes you can. Pacbio file have often specials characters (like slash) in fasta sequences identifiers, try to rename it if you have problem.

ADD REPLYlink written 4.3 years ago by guillaume.gautreau44140

yes I have a problem.

how can I rename ?

this is an example of my pacbio sequences

>SRR1204085.2 length=111
TTTGTTTGTGTGTGGTTTGTCTTGTTGTTTGGTTGGGGTTTCTCTTCGGCTGGTCGGCGTCTCGTGTGTCGCCTTTCTTGTGTTTGTGCGTGTGCTTGGGTTTCCTCGCTT
ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by midox230

NaS don't support space in fasta sequences identifiers

to rename your fasta try:

cat your_sequences.fasta | NaS_v2/FastaToTbl | NaS_v2/TblToFasta > your_sequences_rename.fasta

FastaToTbl: https://github.com/institut-de-genomique/NaS/blob/master/NaS_v2/FastaToTbl

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by guillaume.gautreau44140

​here are my Illumina sequences.

@SEB9BZKS1:57:D16YVACXX:1:1101:2395:1997 1:N:0:ACAGTG
NATTTCTGATCTAGAACGCATAACACATACCACATCATATTAAATGAAATTCTAAGAGTAGAAGGAGCTTATTTGAGCAC
+
#4=DDFFFHGHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJHIJJJJJJIIJJJJJJHHEHHF

I think he misses /1 and /2.

is that there is a solution for adding /1 and /2 in sequence?

thank you

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by midox230

Do you have Illumina reads in pairs in 2 files?

delete 1:N:0:ACAGTG at the end and with a script awk add /1 or /2

For exemple:

cat r1.fastq | awk 'NR%4==1{printf ($1 "/1" "\n");}NR%4!=1{print $0;}'
cat r2.fastq | awk 'NR%4==1{printf ($1 "/2" "\n");}NR%4!=1{print $0;}'
ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by guillaume.gautreau44140

yes I have 2 files of illumina reads.

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by midox230

ok so use the script in my previous message to add /1 and /2

ADD REPLYlink modified 2 days ago by RamRS24k • written 4.3 years ago by guillaume.gautreau44140

ok, thankyou Guillaume.

ADD REPLYlink written 4.3 years ago by midox230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 720 users visited in the last hour