How to bring all reads to the same length? Any tool?
0
0
Entering edit mode
22 months ago
Ankit ▴ 320

Hi,

I have a bed file with varying read length. Is there a tool to bring all reads to the same length (125bp) and keeping intact the original strand (+, -) Information?

Thanks

Read length Bed Tool • 515 views
0
Entering edit mode

You are not actually bringing all reads to same length but want to keep just those BED intervals that are of the same length?

0
Entering edit mode

No.

My reads are of varying length 125-240 (because I merged overlapping paired end reads).

What I want is to reduce the length of the reads to 125bp (bed as source / input file available) .

0
Entering edit mode

You can use reformat.sh from BBMap suite. I don't recollect if you would need to use two passes with minlength=125 to keep only reads that are exactly that length. Take a look at the in-line help to figure that out.

Since BED file only contains intervals that is not useful/needed in this case.

0
Entering edit mode

minlength=0             (ml) Reads shorter than this after trimming will be discarded.  Pairs will be discarded only if both are shorter.


I have reads of varying sizes, I want to make them equal to 125bp by cutting at 3'end.

I do not understand what do you mean by two passes?

Thanks

0
Entering edit mode

You can set minlength=125 in first pass, but this would also retain reads that are longer than 125 bp. If you strictly need reads that are exactly 125 bases then you may need to run a second pass of reformat.sh setting maxlength=125 so reads longer than that will be discarded.

0
Entering edit mode

I do not want to get reads that are below or above 125bp.

I have reads longer than 125 bp (some n numbers). The length of these reads are 125-240. I want to make all of them 125bp.

Is it possible?

I used a awk command to do that.

But I want to do that with a tool.

Of course the strand information should be intact.

Thanks

1
Entering edit mode

I gave you an answer on how to do this in my last comment. Just to be explicit you would do something like this

reformat.sh in=your_fq.gz out=stdout.fq minlength=125 | reformat.sh in=stdin.fq out=reads_w_125bp.fq.gz maxlength=125


First command retains reads >= 125 long. Second part keeps reads that are exactly 125 bp.

If you have paired end reads then use

reformat.sh in1=your_R1_fq.gz in2=your_R2_fq.gz out=stdout.fq minlength=125 | reformat.sh in=stdin.fq out1=reads_w_125bp_R1.fq.gz out2=reads_w_125bp_R2.fq.gz  maxlength=125

0
Entering edit mode

My input file is BED not fastq. It is a mapped data.

reformat.sh Supports sam, fastq, fasta, fasta+qual, scarf, gzip, zip.

I do not want to retain or filter reads above or below 125bp. I want to trim all my reads to read length 125bp.

For example , if I have 2million reads of read length 125-240 bp. I want all of 2million to have read length 125bp exactly.

1
Entering edit mode

If you have a BED file then use awk instead and manually change the lengths.

0
Entering edit mode

Thanks Devon. I was using awk manually. I also found a function in R resize(), which can do the same. Thanks

1
Entering edit mode

resize() will just be a much slower version of awk. No one would bother writing a whole tool for something that can be trivially done in awk.

0
Entering edit mode

Yes Devon. You are right.

Thank you for suggestion.

0
Entering edit mode

Genomax has told you an approach to do that. You are not listening/reading.

I do not know however whether BED would be supported. I dont think what you are doing is a common task, so there is unlikely to be a tried-and-tested preexisting tool. An approach with awk or similar would be perfectly valid

0
Entering edit mode

Yes I used awk already. However I was looking for available tools to do that (to complement the awk). Thanks

0
Entering edit mode

Ok after searching with different keywords, I found this. resize

library("rtracklayer")
userRanges <- import.bed("userFile.bed")

library("GenomicRanges")
resizeRanges <- resize(userRanges, width = 125)

export.bed(resizeRanges, "resizeFile.bed")


This is what I exactly wanted. I wanted to resize all my reads to 125bp.

There is one step prior to running this script.

I separated my reads which are above 125bp and below-equal 125bp. I used awk command to do that. Then I used only above125bp as input to resize function and it has given the desired output. It also keep the strand information intact.

I further correlated it with my manually performed command using awk.

Thank you

0
Entering edit mode

Thanks genomax for the suggestion.