Question: Repetitive element coverage of consensus
1
gravatar for jmack159
5.0 years ago by
jmack15920
United States
jmack15920 wrote:

I have a consensus sequence for a class of repetitive elements (from repbase) and I want to find out the depth coverage of the real genomic instances (taken from ucsc rmask table) against this consensus sequence. The elements tend to become 5' truncated over evolutionary time, so some instances may be quite short, while some instances are full length. I would like to get out something in the end like a wigg file describing the "depth" of instances at each position in the consensus, but I am not sure how to go about it. Any help is greatly appreciated.

EDIT:

One approach I was considering is to break up my instance sequences into k-mers of say 50bp and then align to the consensus reference via bowtie producing BAM and then using samtools to get the windowed coverage. Thoughts?

blast chip-seq alignment genome • 1.6k views
ADD COMMENTlink modified 4.6 years ago by Biostar ♦♦ 20 • written 5.0 years ago by jmack15920
1
gravatar for Manvendra Singh
5.0 years ago by
Manvendra Singh2.1k
Berlin, Germany
Manvendra Singh2.1k wrote:

I think that its doable

For example :

If you have L1Hs element,  ### just assuming that this is family of repetitive elements where your consensus belongs

###### fetch all L1Hs from rmsk_table

1. grep -wi 'L1Hs' rmsk.bed > L1Hs.bed    ### you would get some around 1500 sequences

#### fetch sequences of L1Hs

2. bedtools getfasta -fi hg19.fa -bed L1Hs.bed -fo L1Hs.fa.out

Now you allign your consensus with L1Hs.fa.out

count how many instances are there on every nucleotide resolution ## simple perl script could help or which ever you think is good

 

 

ADD COMMENTlink written 5.0 years ago by Manvendra Singh2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2229 users visited in the last hour