Question

bedtools, linux, rnaseq

0

Entering edit mode

8.1 years ago

Daniel James ▴ 10

I have a huge file in bedfile format and I have to extract only the chr22 using the bedtools. I tried using the sort option but I don't understand how to do it ?

RNA-Seq bedtools • 2.3k views

ADD COMMENT • link 8.1 years ago by Daniel James ▴ 10

2

Entering edit mode

You tried in terminal?

grep "chr22" fileA.bed > fileB.bed

ADD REPLY • link 8.1 years ago by Floris Brenk ★ 1.0k

0

Entering edit mode

I tried this but I have 6 files and I need to store the chr22 from all the files in one file

ADD REPLY • link 8.1 years ago by Daniel James ▴ 10

0

Entering edit mode

cat fileB.bed fileC.bed fileD.bed > all_chr22_files.bed

but the options below works as well

ADD REPLY • link 8.1 years ago by Floris Brenk ★ 1.0k

score 1 · Answer 1 · 2016-03-20

1

Entering edit mode

8.1 years ago

James Ashmore ★ 3.4k

You can either explicitly list the files:

grep -h "chr22" A.bed B.bed C.bed > Result.bed

or, use a wildcard, which uses all the files ending with ".bed" in the current directory:

grep -h "chr22" *.bed > Result.bed

Don't forget to coordinate sort the BED file afterwards, as many programs require this:

sort -k1,1 -k2,2n Result.bed > Result.sorted.bed

ADD COMMENT • link 8.1 years ago by James Ashmore ★ 3.4k

0

Entering edit mode

Thank you, works fine now how would you make it like a tab delineated file using coverage bed options ? can we use hist ?

ADD REPLY • link 8.1 years ago by Daniel James ▴ 10

score 0 · Answer 2 · 2016-03-20

If you're not averse to using BEDOPS, generate the sorted union of N BED files with sort-bed, and use bedextract to pull out elements of the chromosome-of-interest from the set union:

$ sort-bed A.bed B.bed ... N.bed > all.bed
$ bedextract chr22 all.bed > chr22.bed

Our BEDOPS bedextract application uses a binary search approach to jump to the start position of the chromosome-of-interest, and so extraction is much faster than grep or awk, which have to waste time reading through the entire file.

For multi-GB, whole-genome scale files, and especially for extraction of elements at the end of a file, using awk or grep to read through the entire file can be (is) a significant waste of time. Even more so if you have to repeat the extraction for other chromosomes.

The output of BEDOPS tools will be sorted, as well, so it will be ready to use for downstream set operations.

score 0 · Answer 3 · 2016-03-20

0

Entering edit mode

8.1 years ago

Daniel James ▴ 10

How do I create a tab delineated file using coverage Bed options ?

ADD COMMENT • link 8.1 years ago by Daniel James ▴ 10