I have a bed file with varying read length. Is there a tool to bring all reads to the same length (125bp) and keeping intact the original strand (+, -) Information?
You are not actually bringing all reads to same length but want to keep just those BED intervals that are of the same length?
My reads are of varying length 125-240 (because I merged overlapping paired end reads).
What I want is to reduce the length of the reads to 125bp (bed as source / input file available) .
You can use reformat.sh from BBMap suite. I don't recollect if you would need to use two passes with minlength=125 to keep only reads that are exactly that length. Take a look at the in-line help to figure that out.
Since BED file only contains intervals that is not useful/needed in this case.
This function seems to work like discarding reads below integers specified.
minlength=0 (ml) Reads shorter than this after trimming will be discarded. Pairs will be discarded only if both are shorter.
I have reads of varying sizes, I want to make them equal to 125bp by cutting at 3'end.
I do not understand what do you mean by two passes?
You can set minlength=125 in first pass, but this would also retain reads that are longer than 125 bp. If you strictly need reads that are exactly 125 bases then you may need to run a second pass of reformat.sh setting maxlength=125 so reads longer than that will be discarded.
I do not want to get reads that are below or above 125bp.
I have reads longer than 125 bp (some n numbers). The length of these reads are 125-240.
I want to make all of them 125bp.
Is it possible?
I used a awk command to do that.
But I want to do that with a tool.
Of course the strand information should be intact.
I gave you an answer on how to do this in my last comment. Just to be explicit you would do something like this
reformat.sh in=your_fq.gz out=stdout.fq minlength=125 | reformat.sh in=stdin.fq out=reads_w_125bp.fq.gz maxlength=125
First command retains reads >= 125 long. Second part keeps reads that are exactly 125 bp.
If you have paired end reads then use
reformat.sh in1=your_R1_fq.gz in2=your_R2_fq.gz out=stdout.fq minlength=125 | reformat.sh in=stdin.fq out1=reads_w_125bp_R1.fq.gz out2=reads_w_125bp_R2.fq.gz maxlength=125
My input file is BED not fastq. It is a mapped data.
reformat.sh Supports sam, fastq, fasta, fasta+qual, scarf, gzip, zip.
I do not want to retain or filter reads above or below 125bp. I want to trim all my reads to read length 125bp.
For example , if I have 2million reads of read length 125-240 bp. I want all of 2million to have read length 125bp exactly.
If you have a BED file then use awk instead and manually change the lengths.
I was using awk manually.
I also found a function in R resize(), which can do the same.
resize() will just be a much slower version of awk. No one would bother writing a whole tool for something that can be trivially done in awk.
Yes Devon. You are right.
Thank you for suggestion.
Genomax has told you an approach to do that. You are not listening/reading.
I do not know however whether BED would be supported. I dont think what you are doing is a common task, so there is unlikely to be a tried-and-tested preexisting tool. An approach with awk or similar would be perfectly valid
Yes I used awk already.
However I was looking for available tools to do that (to complement the awk).
Ok after searching with different keywords, I found this. resize
userRanges <- import.bed("userFile.bed")
resizeRanges <- resize(userRanges, width = 125)
This is what I exactly wanted. I wanted to resize all my reads to 125bp.
There is one step prior to running this script.
I separated my reads which are above 125bp and below-equal 125bp. I used awk command to do that. Then I used only above125bp as input to resize function and it has given the desired output. It also keep the strand information intact.
I further correlated it with my manually performed command using awk.
Thanks genomax for the suggestion.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy