How to calculate insert size and Stdev of insert file for Configure file of MaSuRCA assembly ?
0
0
Entering edit mode
8 weeks ago
Sony ▴ 10

Hi everyone,

I have a paired-end read sequencing data of Brassica. I preprocessed these raw fastq using trimmomatic (removed adapter sequence, low-quality basses). Then, I aligned the trimmed fastq file with the reference genome using BWA-mem. And I extracted the unmapped reads with SAMtools (flag 0x4). I converted the BAM file containing unmapped reads into fastq using SAMtools bam2fq, and I got a fastq file.

I want to assemble these extracted unmapped reads into contigs with MaSuRCA. But MaSuRCA requires the configure file for running. In the tutorial it is clearly mentioned how to specify the Paired - End data but it is not mentioned how to specify Single-end Illumina data in the config file ? enter image description here In the attached image is an example of a configure file of MaSuRCA, It requires the insert size and stdev of insert size for paired-end reads data, but in my case, my extracted unmapped reads are Single-end, How can I get these value for configure file?

enter image description here

MaSuRCA insert-size assembly • 647 views
ADD COMMENT
1
Entering edit mode

It requires the insert size and stdev of insert size for paired-end reads data, but in my case, my extracted unmapped reads are Single-end, How can I get these value for configure file?

Since you have single-end data there is no way to accurately determine the size of the inserts for data you have just using those reads.

Update : Since these reads came from the same library you can re-use the numbers for PE data.

I have a paired-end read sequencing data of Brassica.

If your initial dataset was paired-end how come you ended up with all single end data after the process you describe above?

MASURCA config file says the following so you may need to use other assemblers, if you have single end reads.

MUST HAVE Illumina paired end reads to use MaSuRCA

ADD REPLY
0
Entering edit mode

I focus on the unmapped reads, and I want to assemble these unmapped reads into novo contigs. When I aligned my trimmed paired end reads fastq file with reference genome, I got SAM file.

Then, I used samtools to extract all unmapped reads as a single group ( samtools view -b -f 4 SRR4289357_mapped.sorted.bam > SRR4289357_unmapped.bam ) instead of extract both paired and singletons unmapped reads separately (like this:

samtools view -h -b -f 132 /opt/home/sony/Brassica_practice/SRR4289357_mapping/bwamem/SRR4289357_mapped.sorted.bam > /opt/home/sony/Brassica_practice/SRR4289357_unmapped/bwa/paired_singletones/sorted_unmapped_R2.bam

samtools view -h -b -f 68 /opt/home/sony/Brassica_practice/SRR4289357_mapping/bwamem/SRR4289357_mapped.sorted.bam > /opt/home/sony/Brassica_practice/SRR4289357_unmapped/bwa/paired_singletones/sorted_unmapped_R1.bam 

It means that after this step, my got only 1 bam file containing all unmapped reads, (not R1, R2 of paired end reads). And I converted this bam file to fastq

I read comment on this post, they said that in the configuration file for MaSuRCA, just replace PE => SE (single end) , like this: PE= se 500 50 /../ummapped.fastq

My question is : how to get the numbers "500 and 50" in this parameter of configuration file : "PE= se 500 50 /../ummapped.fastq". I can calculate if it is paired end read. But in this case, I have only one fastq file

ADD REPLY
0
Entering edit mode

My question is : how to get the numbers "500 and 50" in this parameter of configuration file : "PE= se 500 50 /../ummapped.fastq".

Since these unmapped reads came from the same library you can use the same numbers as you are using for PE.

ADD REPLY
0
Entering edit mode

In this post: How to specify Illumina Single End data in the MaSuRCA Assembler config file enter image description here, "Just specifying one file should work and replacing pe by se."

ADD REPLY
0
Entering edit mode

This post is 10 years old and the configuration example shown on the GitHub page at present clearly says that Illumina reads must be paired-end: https://github.com/alekseyzimin/masurca

That said if you are able to get it to work following directions included in the old thread then there is no harm in trying.

I don't know how many reads you have that are single end but unless it is a large number you are likely chasing after diminished returns.

ADD REPLY

Login before adding your answer.

Traffic: 2566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6