Fastq Input Data Format to abyss-pe
2
0
Entering edit mode
8.6 years ago
sklages ▴ 170

Hi,

I do work with interleaved fastq files, "/1" and "/2" for seqs of the corresponding read.

Assuming I have two files containing data for library "ABC" for PE data and two more files containing data for MP libs "XYZ".

I could:

(1) abyss-pe [..] lib='ABC' mp='XYZ' ABC='abc.A.fq abc.B.fq' XYZ='xyz.A.fq xyz.B.fq' [..]

or I could concatenate both (interleaved) ABC files and both (interleaved) XYZ files and do:

(2) abyss-pe [..] lib='ABC' mp='XYZ' ABC='abc.AB.fq' XYZ='xyz.AB.fq' [..]

Are those two approaches equivalent (in terms of processing and expected results)?

I am not really sure about that as my files are interleaved and I don't want A.fq and B.fq to be considered read pairs ...

I'd obviously prefer (1) as it is clean and simple. Assemblies are currently running, so I don't have any results from trial&error yet.

Thanks,
Sven

abyss assembly • 3.2k views
ADD COMMENT
1
Entering edit mode
8.5 years ago
h.mon 35k

According to the manual, the correct way would be something like:

abyss-pe [..] lib='ABC1 ABC2' mp='XYZ1 XYZ2' ABC1='abc.A.fq' ABC2='abc.B.fq' XYZ1='xyz.A.fq' XYZ2='xyz.B.fq' [..]

ABYSS calculates the insert size distribution empirically for each library, if you concatenate different libraries, the reported insert size histograms will be meaningless - though I believe the only purpose of this insert size distribution is providing some sanity-checking feedback to the user.

ADD COMMENT
0
Entering edit mode

This is the correct answer. If you only pass one file per library, ABySS will treat the file as interleaved. As h.mon says, it is important to specify the libraries separately (i.e. ABC1 and ABC2) in order for ABySS to correctly estimate the fragment size distribution of each library. (The fragment size estimation is done by aligning the read pairs to the assembly contigs.)

ADD REPLY
0
Entering edit mode

Thanks. Interesting. I do have one library on two or more HiSeq lanes; so I need to merge them first (if I want to use interleaved format) before assembling? Providing two (interleaved) files for the same library makes ABySS think these are paired-end?

ADD REPLY
1
Entering edit mode

You do not need to merge them:

abyss-pe [..] lib='ABC_lane1 ABC_lane2 ABC_lane3' ABC_lane1='abc.A.fq' ABC_lane2='abc.B.fq' ABC_lane3='abc.C.fq' [..]

The syntax is really flexible, even if a bit confusing. And yes, I think ABYSS will interpret as paired reads if you provide two files for the same library.

ADD REPLY
0
Entering edit mode

ah, ok. It is now much clearer. So "lib" refers more or less to one fastq file (or pair), not necessarily to a real (sequencing) library. This is probably what has confused me :-)

ADD REPLY
0
Entering edit mode
8.5 years ago

I am not sure about the insert sizes. Certainly for scaffolding with MP then knowing the insert sizes would be important. In other words if you had one MP at 2K insert and another MP at 5K insert size then these need to be separate for ABySS to work. Even for the PE I believe that ABySS uses them for scaffolding and thus knowing the insert size per library -- assuming they are different -- would be important. So I would go with h.mon's solution (which is what I do all of the time).

The ABySS example in the documentation with two files per library is for the case of two non-interleaved files in the library. E.g.,

ABC1="abc.A_R1.fq abc.A_R2.fq' abc2="abc.B_R1.fq abc.B_R2.fq'

If you have interleaved files then only put one file per library.

ADD COMMENT
0
Entering edit mode

Thanks. Yes, it seems I have to merge (interleaved) data of the same library before assembly ...

ADD REPLY

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6