Question

One question about using bowtie2 in genome mapping

0

Entering edit mode

9.6 years ago

zhhxu9 ▴ 20

Hi everyone,

I have a question about bowtie2 in mapping.

I have my paired-end sequencing data, but due to some reason, I got a relatively low read number for the R1 data. But the R2 data have much more reads than R1 data. In that case, if I mapping genome in pair-end way, I think the total reads which can be mapped onto genome will be limited due to the low reads number in R1.

So, can I just use R2 data to do the single end mapping onto the genome? If yes, because R2 contains the sequencing reads for the reverse complementary strand, what parameter should I use when I run bowtie2 to make R2 reads mapping the reverse complementary strand of the genome?

Thanks for kind help!

Zhenhua

bowtie2 mapping • 4.0k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by zhhxu9 ▴ 20

Ram · Accepted Answer · 2014-09-04

4

Entering edit mode

9.6 years ago

Ashutosh Pandey 12k

Two important points:

Even if half of your R2 reads have corresponding R1 reads then I would strongly recommend you to map reads in the paired-end way. See this post about creating paired-end fastq files with reads in the same order. You can do it for paired-end reads for which you have data in both the files (R1 and R2) and then then you can map the rest of the reads that don't have corresponding partner/read (We call them orphan reads in NGS) as a single-end reads.
You don't have to do anything to map reads from reverse complimentary strand on to the genome. Reference genome indexes are generated for both forward and the reverse strand when you use bowtie-build or when you download the pre-built indexes from somewhere else. Another important thing to know is that not all the R2 reads will belong to reverse strand. R2 reads can originate from forwards strand and the corresponding R1 reads will then originate from reverse strand. You can't make any assumptions unless you have a strand specific data which i assume you don't.

Let me know if you have any other questions.

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Thanks for your answers, which are very helpful. You solved most of my questions. I will check the data and to see if there are more problems come out.

ADD REPLY • link 9.6 years ago by zhhxu9 ▴ 20

0

Entering edit mode

Hi, I got further questions. As I mentioned before. My R1 file contains more reads than R2. So when I run bowtie2 in pair-end way, at the beginning, the program run properly and I can see the output SAM file size increasing, which means the results are written in the SAM file. However nothing was written in the file I designated for putting the unaligned reads. The biggest problem is the program will die when the output SAM file reached around 1GB with the information saying Error, fewer reads in file specified with -1 than in file specified with -2, terminate called after throwing an instance of 'int' [bam_sort_core] merging from 2 files... bowtie2-align died with signal 6. I think this error information tell me the problem I know about the data which is the R1 file have more reads than R2. So do you have any idea about how to let bowtie2 run pair-end mapping regardless of my data problem?

Many thanks.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by zhhxu9 ▴ 20

0

Entering edit mode

You can't mix paired-end and single-end data with bowtie2.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

As Devon replied you can't use the mixed data, you will have to create two set of files. First set of files will represent paired-end data and second set of files will represent orphan reads or single end data. I already posted a link in my answer that will help you to create the ordered paired-end fastq files. If your paired-end fastq files are already ordered but the only problem is with the less number of reads in one file, you can simply remove the orphan reads from the other fastq files to create paired-end fastq files. The removed reads can then be aligned as a single-end reads. It should be fairly simple. Basic UNIX commands should be sufficient.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Ashutosh Pandey 12k