Question: Filtering Bed Files By Using Bedops
gravatar for Raghav
7.6 years ago by
Allahabad, India
Raghav100 wrote:

hello every one,

I have paired end illumina reads, R1.fastq and R2.fastq and I have mapped them as single-end reads by using bowtie2 default parameters, I performed further downstream analysis by using samtools and bedops, and now I have R1.bed and R2. bed I made two sets, one of them have R1_uniquely_mapped.bed, R2_uniquely_mapped.bed and second of them R1_mapped_more_than_1.bed , R2_mapped_more_than_1.bed.

because R1 and R2 belongs paired end reads, and my restriction library has maximum 2KB size, then R1 and R2 pairs must be present in less than 2 kb territory of chromosome

theoretically I am assuming, in R1.bed format,

chr1  100   180    @R1_read1______1 .................  
chr1   1000  1090 @R1_read2______1................

In R2.bed format,

chr1 2100   2180 @R2_read1_____2............. ## I just add 2KB length with respect to R1.bed###
chr1 2500 2590    @R2_read______2......... ## I just add 1.5KB [1500nts] with respect to R1.bed, because my library is >= 2KB.

How can I customize downstream tools like BEDOPS or bedtool which can capture such type of reads or alignment????? How can I filter this type of infromation by using bedops tool????

all suggestions and comments are most welcome,

bedtools bed bowtie2 • 3.6k views
ADD COMMENTlink modified 7.0 years ago by Biostar ♦♦ 20 • written 7.6 years ago by Raghav100

I really think you should format your question in a way people can understand. Maybe I am slow this morning but I do not understand the problem at hand. Also asking a question with "????????" is adding 10 superfluous question marks, one suffices to ask a question.

ADD REPLYlink written 7.6 years ago by Gabriel R.2.8k

If you have paired end reads, I don't understand why you're mapping them as single end? Bowtie2 can map paired end reads:

ADD REPLYlink modified 7.6 years ago • written 7.6 years ago by Jelena Aleksic920

Dear Jelena, I tried botie2 for paired end also and I ran it over more than 20 times for paired end reads by adjusting -I and -X parameters, also with different parameters, and got four type of mapped results, aligned exactly 1 time, aligned >1 times, aligned concordantly exactly 1 time, aligned concordantly >1 times, aligned discordantly 1 time, it was very difficult for me to choose which one is better for my analysis, and it was very complicated to go with all simultaneously. because, I did not find any significant difference between single end and paired end mapping [in both cases alignment rate is 70%] this force me to go through single end mapping, Here might be I am missing technicality of paired end mapping, but if mapping percent is equal, it is my belief that, in further downstream analysis I would be able to customize tool according to me.

comments of technicality of paired-end mapping are always welcome.

ADD REPLYlink written 7.6 years ago by Raghav100

how can I extract chromosome name, chromosome start position and chromosome end position of those reads in R1.bed and R2.bed which are paired-ends?

ADD REPLYlink modified 7.6 years ago by Giovanni M Dall'Olio27k • written 7.6 years ago by Raghav100

I don't get it, you have a tab delimited file and you want to extract a given column ?

ADD REPLYlink written 7.6 years ago by Gabriel R.2.8k
gravatar for Jordan
7.6 years ago by
Jordan1.2k wrote:

Like Jelena said, you should be using paired end option while mapping using Bowtie2.

Regarding the question, I think you can use range in bedops to identify regions which are with in a range of 2000bp.

First you need to sort the bed file:

sort-bed R1.bed > R1.sorted.bed
sort-bed R2.bed > R2.sorted.bed

Now, we can use the range operator to find regions in R1.sorted.bed which are as far as 2000bp away from R2.sorted.bed.

bedops --range 2000 -e -1 R1.sorted.bed R2.sorted.bed

It's more detailed here: bedops wiki

EDIT: Edited as said by Alex Reynolds.

ADD COMMENTlink modified 13 months ago by _r_am32k • written 7.6 years ago by Jordan1.2k

Using the --everything operator reports and pads everything from R1 and R2 - this is equivalent to a multiset union operation.

You might perhaps instead replace that operator with --element-of -1 to report regions from R1.sorted.bed that overlap R2.sorted.bed by one or more bases (along with --range 2000 to look for 2000-base-padded elements in R1 that overlap R2 elements).

In other words:

$ bedops --range 2000 --element-of -1 R1.sorted.bed R2.sorted.bed > answer.bed
ADD REPLYlink modified 13 months ago by _r_am32k • written 7.6 years ago by Alex Reynolds31k

Ah, my bad! Corrected it.

ADD REPLYlink written 7.6 years ago by Jordan1.2k
gravatar for JC
7.6 years ago by
JC12k wrote:

A third vote to repeat your mapping USING the pair-end information, first, this will improve the mapping because one pair can confirm or reject alternative alignments.

ADD COMMENTlink written 7.6 years ago by JC12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 946 users visited in the last hour