Question: Questions About Pair-End Reads Which Are Not Match For Bwa
2
gravatar for Tonyzeng
6.2 years ago by
Tonyzeng300
Tonyzeng300 wrote:

My question is that my original data or pair end read data are consisted of Read1 and Read2 and they are corresponding to each other. When I have done quality filtering, Read1 and Read2 data are not corresponding to each other, is that mean my data are not pair-end data anymore? Is this the reason that BWA was failed when I assigned Read1 and Read2 read as pair end reads? If yes, I need to set them as single end read and run BWA "aln" separately, right? Last question is should I merge the read 1 BAM file and read2 BAM file together by using BWA "sampe"?

Thank you very much for the answer!!

bwa • 5.0k views
ADD COMMENTlink modified 6.2 years ago by Biomonika (Noolean)3.1k • written 6.2 years ago by Tonyzeng300
2

I have just found this very nice script from Eric Normandeau hosted here: https://github.com/enormandeau/Scripts/blob/master/fastqCombinePairedEnd.py from this post Combining the paired reads from Illumina run It basically does the job, you get 2 files for forward and reverse reads that do pair and one extra file for orphans.

ADD REPLYlink written 6.2 years ago by Biomonika (Noolean)3.1k

This script is executing fine on my laptop...give perfect results when I execute it on my laptop. But when I try to execute it on the server, it generates blank files. I don't know why. The operating system of my server is Debian. Do have any Idea like how can I fix this??

ADD REPLYlink written 3.1 years ago by swadha20

how have you done "quality filtering" ? di you remove some reads from one fastq but not from the other (mate) ?

ADD REPLYlink written 6.2 years ago by Pierre Lindenbaum124k
1
gravatar for Ashutosh Pandey
6.2 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

You should start everything from scratch. By scratch I mean start with the original fastq files and use some filtering tools that preserves the fastq order of the forward and reverse reads. Trimmomatic is one I use for filtering. Link: http://www.usadellab.org/cms/?page=trimmomatic

ADD COMMENTlink written 6.2 years ago by Ashutosh Pandey11k

ashutoshmits, thank you for your suggestion. When I filter reads (Fastx-toolkit) which has quality under 20 which accounts for more 75% of sequence for both read1 and 2 files, for example, the 5th read in read1 file has been filtered out because it has more than 75% sequence with quality score under 20. However, the 5th read in read2 file is stayed because it is good. in this situation, we can not find corresponding the 5th read in read1 and 2 file.

If I use Trimmomatic to preserve the fastq order of read1 and 2, I am afraid that it does not work, right?

ADD REPLYlink written 6.2 years ago by Tonyzeng300

It will create 4 different files. The first two files will have read1 and read2 with the same order. In other words, if the 5th read of read1 file is filtered then the 5th read of read2 file is also filtered. Don't worry much if you are loosing some reads. They will not affect your analysis in any way. The other two files will have the reads from read1 and read2 files for which the other pair was discarded. In your case, read 5 of read2 file will be present in one of the files. Read Trimmomatic and you will understand why it is good to use. Filtering step is still happening but order of the reads in two files is maintained too.

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by Ashutosh Pandey11k
1
gravatar for Biomonika (Noolean)
6.2 years ago by
State College, PA, USA
Biomonika (Noolean)3.1k wrote:

You do not necessarily need to start your analysis from scratch. Use FASTQ joiner from Galaxy toolkit online: http://main.g2.bx.psu.edu/

From manual:

This tool joins paired end FASTQ reads from two separate files into a single read in one file. The join is performed using sequence identifiers, allowing the two files to contain differing ordering. If a sequence identifier does not appear in both files, it is excluded from the output.

So, if the sequence identifier does not appear in both files (which often happens after quality filtering), you can still get consistent data afterwards. This tool will put both your pairs into the same file, so you will have to split them again (with added value of not having solo reads without partner).

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Biomonika (Noolean)3.1k

Trimmomatic is pretty fast. Uploading fastq files to galaxy and running the analyses and downloading them back may take a while. I do have a script that does equivalent to Fastq joiner but its in python and will take more time than running the analysis from scratch.

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by Ashutosh Pandey11k

You are right, but rongzeng might not want to use another (new) software for whatever reason.

ADD REPLYlink written 6.2 years ago by Biomonika (Noolean)3.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1593 users visited in the last hour