Entering edit mode
4.8 years ago
Ankit
▴
500
Hi,
I have a bed file with varying read length. Is there a tool to bring all reads to the same length (125bp) and keeping intact the original strand (+, -) Information?
Please help.
Thanks
You are not actually bringing all reads to same length but want to keep just those BED intervals that are of the same length?
No.
My reads are of varying length 125-240 (because I merged overlapping paired end reads).
What I want is to reduce the length of the reads to 125bp (bed as source / input file available) .
You can use
reformat.sh
from BBMap suite. I don't recollect if you would need to use two passes withminlength=125
to keep only reads that are exactly that length. Take a look at the in-line help to figure that out.Since BED file only contains intervals that is not useful/needed in this case.
This function seems to work like discarding reads below integers specified.
I have reads of varying sizes, I want to make them equal to 125bp by cutting at 3'end.
I do not understand what do you mean by two passes?
Thanks
You can set
minlength=125
in first pass, but this would also retain reads that are longer than 125 bp. If you strictly need reads that are exactly 125 bases then you may need to run a second pass ofreformat.sh
settingmaxlength=125
so reads longer than that will be discarded.I do not want to get reads that are below or above 125bp.
I have reads longer than 125 bp (some n numbers). The length of these reads are 125-240. I want to make all of them 125bp.
Is it possible?
I used a awk command to do that.
But I want to do that with a tool.
Of course the strand information should be intact.
Thanks
I gave you an answer on how to do this in my last comment. Just to be explicit you would do something like this
First command retains reads >= 125 long. Second part keeps reads that are exactly 125 bp.
If you have paired end reads then use
My input file is BED not fastq. It is a mapped data.
reformat.sh Supports sam, fastq, fasta, fasta+qual, scarf, gzip, zip.
I do not want to retain or filter reads above or below 125bp. I want to trim all my reads to read length 125bp.
For example , if I have 2million reads of read length 125-240 bp. I want all of 2million to have read length 125bp exactly.
If you have a BED file then use
awk
instead and manually change the lengths.Thanks Devon. I was using awk manually. I also found a function in R resize(), which can do the same. Thanks
resize()
will just be a much slower version ofawk
. No one would bother writing a whole tool for something that can be trivially done inawk
.Yes Devon. You are right.
Thank you for suggestion.
Genomax has told you an approach to do that. You are not listening/reading.
I do not know however whether BED would be supported. I dont think what you are doing is a common task, so there is unlikely to be a tried-and-tested preexisting tool. An approach with
awk
or similar would be perfectly validYes I used awk already. However I was looking for available tools to do that (to complement the awk). Thanks
Ok after searching with different keywords, I found this. resize
This is what I exactly wanted. I wanted to resize all my reads to 125bp.
There is one step prior to running this script.
I separated my reads which are above 125bp and below-equal 125bp. I used awk command to do that. Then I used only above125bp as input to resize function and it has given the desired output. It also keep the strand information intact.
I further correlated it with my manually performed command using awk.
Thank you
Thanks genomax for the suggestion.