Question: Are the UCSC genome assemblies non-redundant? How do I get a non-redundant genome fasta?
gravatar for rmartson
23 months ago by
rmartson0 wrote:

I'm looking to have a single FASTA sequence for each chromosome in an organism, but if I check the sequences in panTro5.fa (chimp) that I've downloaded from UCSC I get a ton of ids like: chr10_NW_015973889v1_random, chr10_NW_015973890v1_random, etc.

What are these and how do I get rid of them? I don't have them in my hg38.fa (human) file because you can download all the chromosomes individually and then assemble them into one fasta, but I don't think you get that option with other genomes.

I need to use the genomes to find hits for viral LTR sequences and the number of hits is important so I don't want to get the same hit in the same region of the genome twice or more.

ucsc blast genome • 660 views
ADD COMMENTlink written 23 months ago by rmartson0
gravatar for h.mon
23 months ago by
h.mon25k wrote:

These random regions hits you are getting are believed to be real (and different from each other), they are not assigned a proper location probably because they are flanked by (even more) repetitive regions. You can get chr_random on the human genoma as well, it depends from where you downloaded the fasta and if you (or someone else) post-processed the genome after download.

I would argue you will get a better number for viral LTR sequences using chr_random sequences, but it will be problematic to compare assemblies of different qualities.

ADD COMMENTlink written 23 months ago by h.mon25k

Alright, I'll download the fasta file with random regions for the human genome as well then.

ADD REPLYlink written 23 months ago by rmartson0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 666 users visited in the last hour