How to bring all reads to the same length? Any tool?
0
0
Entering edit mode
4.3 years ago
Ankit ▴ 500

Hi,

I have a bed file with varying read length. Is there a tool to bring all reads to the same length (125bp) and keeping intact the original strand (+, -) Information?

Please help.

Thanks

Read-length Bed • 1.5k views
ADD COMMENT
0
Entering edit mode

You are not actually bringing all reads to same length but want to keep just those BED intervals that are of the same length?

ADD REPLY
0
Entering edit mode

No.

My reads are of varying length 125-240 (because I merged overlapping paired end reads).

What I want is to reduce the length of the reads to 125bp (bed as source / input file available) .

ADD REPLY
0
Entering edit mode

You can use reformat.sh from BBMap suite. I don't recollect if you would need to use two passes with minlength=125 to keep only reads that are exactly that length. Take a look at the in-line help to figure that out.

Since BED file only contains intervals that is not useful/needed in this case.

ADD REPLY
0
Entering edit mode

This function seems to work like discarding reads below integers specified.

minlength=0             (ml) Reads shorter than this after trimming will be discarded.  Pairs will be discarded only if both are shorter.

I have reads of varying sizes, I want to make them equal to 125bp by cutting at 3'end.

I do not understand what do you mean by two passes?

Thanks

ADD REPLY
0
Entering edit mode

You can set minlength=125 in first pass, but this would also retain reads that are longer than 125 bp. If you strictly need reads that are exactly 125 bases then you may need to run a second pass of reformat.sh setting maxlength=125 so reads longer than that will be discarded.

ADD REPLY
0
Entering edit mode

I do not want to get reads that are below or above 125bp.

I have reads longer than 125 bp (some n numbers). The length of these reads are 125-240. I want to make all of them 125bp.

Is it possible?

I used a awk command to do that.

But I want to do that with a tool.

Of course the strand information should be intact.

Thanks

ADD REPLY
1
Entering edit mode

I gave you an answer on how to do this in my last comment. Just to be explicit you would do something like this

reformat.sh in=your_fq.gz out=stdout.fq minlength=125 | reformat.sh in=stdin.fq out=reads_w_125bp.fq.gz maxlength=125

First command retains reads >= 125 long. Second part keeps reads that are exactly 125 bp.

If you have paired end reads then use

reformat.sh in1=your_R1_fq.gz in2=your_R2_fq.gz out=stdout.fq minlength=125 | reformat.sh in=stdin.fq out1=reads_w_125bp_R1.fq.gz out2=reads_w_125bp_R2.fq.gz  maxlength=125
ADD REPLY
0
Entering edit mode

My input file is BED not fastq. It is a mapped data.

reformat.sh Supports sam, fastq, fasta, fasta+qual, scarf, gzip, zip.

I do not want to retain or filter reads above or below 125bp. I want to trim all my reads to read length 125bp.

For example , if I have 2million reads of read length 125-240 bp. I want all of 2million to have read length 125bp exactly.

ADD REPLY
1
Entering edit mode

If you have a BED file then use awk instead and manually change the lengths.

ADD REPLY
0
Entering edit mode

Thanks Devon. I was using awk manually. I also found a function in R resize(), which can do the same. Thanks

ADD REPLY
1
Entering edit mode

resize() will just be a much slower version of awk. No one would bother writing a whole tool for something that can be trivially done in awk.

ADD REPLY
0
Entering edit mode

Yes Devon. You are right.

Thank you for suggestion.

ADD REPLY
0
Entering edit mode

Genomax has told you an approach to do that. You are not listening/reading.

I do not know however whether BED would be supported. I dont think what you are doing is a common task, so there is unlikely to be a tried-and-tested preexisting tool. An approach with awk or similar would be perfectly valid

ADD REPLY
0
Entering edit mode

Yes I used awk already. However I was looking for available tools to do that (to complement the awk). Thanks

ADD REPLY
0
Entering edit mode

Ok after searching with different keywords, I found this. resize

library("rtracklayer")
userRanges <- import.bed("userFile.bed")

library("GenomicRanges")
resizeRanges <- resize(userRanges, width = 125)

export.bed(resizeRanges, "resizeFile.bed")

This is what I exactly wanted. I wanted to resize all my reads to 125bp.

There is one step prior to running this script.

I separated my reads which are above 125bp and below-equal 125bp. I used awk command to do that. Then I used only above125bp as input to resize function and it has given the desired output. It also keep the strand information intact.

I further correlated it with my manually performed command using awk.

Thank you

ADD REPLY
0
Entering edit mode

Thanks genomax for the suggestion.

ADD REPLY

Login before adding your answer.

Traffic: 2750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6