Question: adding suffix /1 & /2 to PE data - abyss input data
0
gravatar for Salim Bougouffa
3.6 years ago by
Salim Bougouffa10 wrote:

Hi Abyssers,

I have run trimmomatric on my PE data which generates R1 & R2 files that do not have any suffixes. Now it is not obvious to me whether I must add the /1 & /2 to each read name or by simply telling Abyss that the reads are pairs using pe='r1.fastq r2.fastq' it should recognise the pairs and get on with the assembly correctly.

 

/SB

abyss assembly • 2.1k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Salim Bougouffa10
2
gravatar for SES
3.6 years ago by
SES8.2k
Vancouver, BC
SES8.2k wrote:

Yes, I believe that abyss requires the pair information to be present (either 1/2, forward/reverse or A/B) and the files may be separate or interleaved. You can add the pair information back with Pairfq. Here is an example (requires curl and perl):

curl -sL git.io/pairfq_lite | perl - addinfo -i R1.fq -o R1_info.fq -p 1
curl -sL git.io/pairfq_lite | perl - addinfo -i R2.fq -o R2_info.fq -p 2

That should go pretty fast and the input can be fasta or fastq (compressed is fine also I believe).

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by SES8.2k
2
gravatar for Brian Bushnell
3.6 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

Alternatively, with BBMap:

reformat.sh in=r1.fq in2=r2.fq out1=renamed1.fq out2=renamed2.fq addslash

ADD COMMENTlink written 3.6 years ago by Brian Bushnell16k
1
gravatar for Adrian Pelin
3.6 years ago by
Adrian Pelin2.3k
Canada
Adrian Pelin2.3k wrote:

Solutions posted so far would work great. I just remembered an old blogpost where there was a onliner to convert new illumna naming scheme to old using this one liner:

cat new-style_.fastq | awk '{if (NR % 4 == 1) {split($1, arr, ":"); printf "%s_%s:%s:%s:%s:%s#0/%s (%s)\n", arr[1], arr[3], arr[4], arr[5], arr[6], arr[7], substr($2, 1, 1), $0} else if (NR % 4 == 3){print "+"} else {print $0} }' > old-style.fastq

https://contig.wordpress.com/2011/09/01/newbler-input-iii-a-quick-fix-for-the-new-illumina-fastq-header/

It's kinda nice since you really are not relying on any other tools, just bash and good ol' awk. I think this will only work if you do have the new header (something like 1:N:0 and 2:N:0), it may not if you have no info about pairs in your header.

ADD COMMENTlink written 3.6 years ago by Adrian Pelin2.3k
1
gravatar for Salim Bougouffa
3.6 years ago by
Salim Bougouffa10 wrote:

Hello again,

I received a reply from Ben Vandervalk who is one of the authors of Abyss and it goes as follows:

#################################################################
pe="r1.fastq r2.fastq" should suffice.

ABySS requires that either:

(i) the read names for both reads are identical, OR
(ii) the read names have an identical prefix, followed by "/1" and "/2", respectively.

- Ben
#################################################################

ADD COMMENTlink written 3.6 years ago by Salim Bougouffa10
0
gravatar for Salim Bougouffa
3.6 years ago by
Salim Bougouffa10 wrote:

Thank you lads. I went ahead and assumed that the suffixes are important. Thank you for confirming that.

I used BBMap's reformatter script for this.

I have a related question though. I run abyss-2fastq on some data I was analysing a week ago and it added /1/2 not at the very end but towards the end before the last few characters e.g. below. Is this recognisable by Abyss?

@HISEQ:149:C76YNACXX:3:1101:1159:2191/1 1:N:0:GCCAAT
ATAATTAAAGCAGGAATAGTAAAAAAACGTCCCTTAAAACGTATCAAGAAATCCGACCCAGACTGGGATTACGCAACCTGCGACGGCCCGTTGTGCCTGCG
+
BBBFFFFFFFFFFIBFIFIIIIIIIIIIIIIIIFIFFIIIFFFIBFIIIIIIFFFIFFFFFFFFFFFBBFFFFBFFFFFFBBFFFFFFF<BFFBFBBFFFF
@HISEQ:149:C76YNACXX:3:1101:1159:2191/2 2:N:0:GCCAAT
AACCTTGCGACGACCTGAAGGACGGACCGTCGCAGGCACAACGGGCCGTCGCAGGTTGCGTAATCCCAGTCTGGGTCGGATTTCTTGATACGTTTTAAGGG
+
BBBFFFFFFFFFFFIIIIFFIFIIIFFFFIFFFFFFFFFFFFFBFBBFFF7<77B<BB<BBBBFFFFBBBFBFFF<BBF7BBBBFFB<BBFBBFFF<<BBF

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Salim Bougouffa10

hard to tell, but these can be easily removed....

sed -i 's, 1:N:0:GCCAAT,,g' file_r1.fastq

ADD REPLYlink written 3.6 years ago by Adrian Pelin2.3k

Hi Salim,

ABySS treats the first whitespace-separated word in the line as the read ID, so there is no need to remove the '1:N:0:GCCAAT' or '2:N:0:GCCAAT'.  Everything after the first space is considered to be a comment/description.

ADD REPLYlink written 3.6 years ago by benv710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour