Question

Estimating sequence complexity of DNA sequence from fasta file

0

Entering edit mode

5.6 years ago

epaminonda ▴ 10

Hi

I have mapped some pacbio reads to a mammal reference and I am now focusing on understanding the coverage in a ~5Mb region of interest.

It seems the majority of the region is well covered by reads. However, a locus of about 500kb in the centre of the region has no reads mapping to it at all. I have downloaded a RepeatMasker track from the UCSC and found that the 5MB region appears to be, at first inspection, quite repeat-dense and shows high transposable element activity.

I would like to quantitatively assess whether my reads are not mapping to the 500kb locus in my region because of extreme repetitivity, ie. I would like to determine if this region shows higher repetitive structure than the neighbouring sequence. My thinking is that, if the repetitive content of the region with 0 alignment is not significantly higher than the neighbourhood, than reads are not mapping in there because of other reasons, eg structural diversity.

Is there a way to achieving the above? I have downloaded and installed a local copy of the RepeatMasker program, and run the tool locally on my sequence, however this appears to focus more on the annotation of the actual repetitive elements, and on masking them, rather than the estimation of their 'density' so to say. Thanks!

sequence RepeatMasker pacbio • 835 views

ADD COMMENT • link updated 2 days ago by Ram 43k • written 5.6 years ago by epaminonda ▴ 10