Question: Repeat elements, SINEs, LINEs, LTRs in specific regions of gene
0
gravatar for Kian
18 months ago by
Kian40
Kian40 wrote:

Hi I have a list of more than 1000 genes, i want to calculate repeat elements like SINE, LINE, LTR frequency of these genes in several region, like in exone, intron, 3utr, 5utr, upstream, downstream. and in the specific region how many there are LINE, SINE and LTRs.

chrom   strand  Start       End         LINE     SINE 
chr4    +       5104898     524438      86.00    80
chr4    +       11912008    11924714      1      20
ADD COMMENTlink modified 18 months ago by Alex Reynolds29k • written 18 months ago by Kian40
3
gravatar for genomax
18 months ago by
genomax73k
United States
genomax73k wrote:

You can get the repeatmasker track as a BED file and then intersect with you list using BEDtools or BEDOPS.

ADD COMMENTlink written 18 months ago by genomax73k

Thanks Dear genomax , you mean i first in UCSC, repeat masker track, get bet output format as BED, and then take this file to BEDOPS to get the specific repeat like SINE, LINE ,.. for each region? is it true?

ADD REPLYlink written 18 months ago by Kian40
2
gravatar for Alex Reynolds
18 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

To do things entirely on the command line, one approach is to download the RepeatMasker analysis for your genome of interest directly from ISB.

For example, for hg38:

$ wget -qO- http://www.repeatmasker.org/genomes/hg38/RepeatMasker-rm405-db20140131/hg38.fa.out.gz | gunzip -c > hg38.fa.out

Then convert this RepeatMasker analysis to BED with BEDOPS convert2bed:

$ convert2bed --input=rmsk < hg38.fa.out > hg38.fa.out.bed

If you want broader repeat element category names as IDs in this file, use the following modification:

$ convert2bed --input=rmsk < hg38.fa.out | cut -f1-3,11 > hg38.fa.out.bed

This last conversion result puts the following keywords into the ID field of hg38.fa.out.bed:

DNA
DNA/Kolobok
DNA/MULE-MuDR
DNA/Merlin
DNA/PIF-Harbinger
DNA/PiggyBac
DNA/TcMar
DNA/TcMar-Mariner
DNA/TcMar-Pogo
DNA/TcMar-Tc1
DNA/TcMar-Tc2
DNA/TcMar-Tigger
DNA/TcMar?
DNA/hAT
DNA/hAT-Ac
DNA/hAT-Blackjack
DNA/hAT-Charlie
DNA/hAT-Tag1
DNA/hAT-Tip100
DNA/hAT-Tip100?
DNA/hAT?
DNA?
DNA?/PiggyBac?
DNA?/hAT-Tip100?
LINE/CR1
LINE/Dong-R4
LINE/Jockey
LINE/L1
LINE/L1-Tx1
LINE/L2
LINE/Penelope
LINE/RTE-BovB
LINE/RTE-X
LTR
LTR/ERV1
LTR/ERV1?
LTR/ERVK
LTR/ERVL
LTR/ERVL-MaLR
LTR/ERVL?
LTR/Gypsy
LTR/Gypsy?
LTR?
Low_complexity
RC/Helitron
RC?/Helitron?
RNA
Retroposon/SVA
SINE/5S-Deu-L2
SINE/Alu
SINE/MIR
SINE/tRNA
SINE/tRNA-Deu
SINE/tRNA-RTE
SINE?/tRNA
Satellite
Satellite/acro
Satellite/centr
Satellite/telo
Simple_repeat
Unknown
rRNA
scRNA
snRNA
srpRNA
tRNA

If you want to do everything in one pass:

$ wget -qO- http://www.repeatmasker.org/genomes/hg38/RepeatMasker-rm405-db20140131/hg38.fa.out.gz \
    | gunzip -c \
    | convert2bed --input=rmsk \
    | cut -f1-3,11 \
    > hg38.fa.out.bed

Use these kinds of streams where you can! It's a huge timesaver.

Once you have your RepeatMasker analysis as a BED file, you can do set operations with BEDOPS bedmap and your regions-of-interest:

$ bedmap --echo --echo-map-id --delim '\t' regions.bed hg38.fa.out.bed > answer.bed

The regions.bed file would be a sorted BED file containing regions-of-interest.

Regions-of-interest would be one of subsets of regions you want to investigate: exons, introns, 3'UTR, 5'UTR, upstream or downstream windows, etc.

The file answer.bed will contain regions-of-interest and the the RepeatMasker repeat element category that overlaps that region in the last column.

In other words, you can pipe this answer.bed file into awk or other scripts to count the number of repeat element category hits you get for regions-of-interest, or do other downstream statistics.

ADD COMMENTlink modified 18 months ago • written 18 months ago by Alex Reynolds29k

Thanks Dear Alex, i want to do the analysis in bos taurus8. what i should replace with hg38?

ADD REPLYlink written 18 months ago by Kian40

http://www.repeatmasker.org/species/bosTau.html offers bosTau7. If you need bosTau8, it looks like you'd need to run RepeatMasker on your genome.

ADD REPLYlink written 18 months ago by Alex Reynolds29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2125 users visited in the last hour