Question: How Can I Use Mate-Pair Sequences For Soapdenovo?
1
gravatar for toshnam
7.7 years ago by
toshnam620
Seoul, Republic of Korea
toshnam620 wrote:

Hi all,

I want to assemble paired-end sequences and mate-pair sequences (HiSeq2000) together using SOAPdenovo.

On SOAPdenovo home page, mate-pair usage is written as follows: "Mate-pair relationship could be indicated in two ways: two sequence files with reads in the same order belonging to a pair, or two adjacent reads in a single file (FASTA only) belonging to a pair." (http://soap.genomics.org.cn/)

How can I convert raw mate-pair FASTQ file into proper format for SOAPdenovo assembly? Is there any converting script?

Thanks.

• 6.9k views
ADD COMMENTlink modified 9 months ago by Biostar ♦♦ 20 • written 7.7 years ago by toshnam620
2
gravatar for Fabian Bull
7.7 years ago by
Fabian Bull1.3k
German
Fabian Bull1.3k wrote:

The important thing is to set reverse_seq to 1.

Example config:

max_rd_len=100 [LIB] avg_ins=2000 reverse_seq=1 asm_flags=3 rank=1 q1=/path/to/fastq_read_1.fq q2=/path/to/fastq_read_2.fq

This sets the maximal read length to 100 and the average insert size to 2000. The asm_flags is used to declare that your reads are used for assembly and scaffolding. The rank parameter can be set if you have multiple libraries. fastq_read_1.fq and fastq_read_2.fq are FastQ-files having the same reads in the same order. If you have single reads to add (maybe some ends were thrown out in quality filtering) use q. If you have FastA-files instead of FastQ-files use likewise f1,f2 and f

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Fabian Bull1.3k

Thank you for your help. "reverse_seq=1" is a solution for using mate pair sequences! Below is SOAPdenovo home page's comments. "There are two types of paired-end libraries: a) forward-reverse, generated from fragmented DNA ends with typical insert size less than 800 bp; b) reverse-forward, generated from circularizing libraries with typical insert size greater than 2 Kb. User should set parameter for tag "reverse_seq" to indicate this: 0, forward-reverse; 1, reverse-forward."

ADD REPLYlink written 7.7 years ago by toshnam620
1
gravatar for ALchEmiXt
7.7 years ago by
ALchEmiXt1.9k
The Netherlands
ALchEmiXt1.9k wrote:

AFAIK you can use the mate-paired-end illumina dataset as you are using the regular PE datasets using the /1 and /2 delimiters. However you need to make sure the reads are pointing the right direction! mp libs are generating read pairs like A<---->B and you need to madify them into A---><---B (revcomplement otherwise you get negative mapping distances). I think that is all.

Please beware to filter your mp dataset beforehand since it is known to contain easily many adapter artefacts.

my 2ct

ADD COMMENTlink written 7.7 years ago by ALchEmiXt1.9k
0
gravatar for Jan Van Haarst
7.7 years ago by
Wageningen, NL
Jan Van Haarst300 wrote:

You create a config file like this (following the manual at http://soap.genomics.org.cn/soapdenovo.html#comm2 ):

max_rd_len=125
[LIB]
avg_ins=200
asm_flags=3
reverse_seq=0
rank=1
q1=/home/jvh/data/SequenceAssembly/nobackup/SRA/SRP000220/SRX000429/SRR001665_1.fastq
q2=/home/jvh/data/SequenceAssembly/nobackup/SRA/SRP000220/SRX000429/SRR001665_2.fastq

In the config file, you define libraries with [LIB] , and each library can contain different readsets. Above I have defined a readset consisting of two files, containing paired reads in the inward configuration, or as the manual says forward-reverse ( --> <-- ). How you reads are oriented depends on how the reads were produced. The reads must be in the same order in the files, and no read should be missing, otherwise SOAPdenovo will not work right.

ADD COMMENTlink written 7.7 years ago by Jan Van Haarst300
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1655 users visited in the last hour