Whole genome coverage plot
23 days ago
dyfn1947 • 0

Hello, I would like to remove repeat sequence when I make a coverage plot. Red circle is repeat sequence position. How can I re remove that repeat sequence? Thanks

whole_genome repeat_sequence coverage • 610 views
you could hard or soft mask your repeat sequence in the genome, and re-perform mapping.

Or use something like samtools view to remove reads overlapping these regions. That is probably better because if a read really comes from these repeats they are properly "decoyed" with full genome alignment. But if you mask the regions they might falsely align somewhere else. Make a BED file with the coordinates you want not included, make the complement against the entire genome (bedtools complement) and then use the -L option of samtools view to only keep reads that overlap the complement file (which is the genome minus the regions you do not want).

i was skeptical of this but here is lh3 saying so also Which Aligners Recognize Soft-Masked Repeats In Reference Sequences? :)

He says

No, do not align to masked genome for any purpose. Filter out the reads mapped to the masked region after whole-genome alignment.

which is I think the best you can do for the aforementioned reasons.

Thanks so much !

Are you sure you mean repeat sequences? Do you not mean duplicate sequences?

Yes I'm sure that is repeat sequence