removing/ masking satellite sequences in data
3
0
Entering edit mode
8.9 years ago
kanwarjag ★ 1.2k

What is easiest way to exclude/mask satellite sequences in bam file of ChIP-seq data?

Thanks
Kanwar

masking satellite-sequences • 1.9k views
ADD COMMENT
0
Entering edit mode
8.9 years ago

You can use "bedtools maskfasta". You can get the regions from the UCSC table browser.

ADD COMMENT
0
Entering edit mode
8.9 years ago

I use subtractBed. This helps me retain the bed file architecture with my reads and I can insert it anytime/anywhere in my pipeline after mapping. maskfasta give a fasta output.

ADD COMMENT
0
Entering edit mode
8.9 years ago

Assuming hg19, grab repeat data, if needed:

$ mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg19 -e 'select genoName, genoStart, genoEnd, repName, swScore, strand from rmsk' | tail -n +2 > repeats.bed
$ head -3 repeats.bed
chr1    10000    10468    (CCCTAA)n    1504    +
chr1    10468    11447    TAR1    3612    -
chr1    11503    11675    L1MC    437    -

Then perform set operations with BEDOPS:

$ bedops -n 1 <(bam2bed < reads.bam) repeats.bed > reads_that_do_not_overlap_repeats.bed

Then convert the result to the desired end format.

ADD COMMENT

Login before adding your answer.

Traffic: 1552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6