Create an assembly with a ribosomic chromosome
2
1
Entering edit mode
5.8 years ago
ra2967 ▴ 20

Hello all,

I would like to create a genomic assembly of mm10+dnaribosomic sequenc. I have this sequence in FASTA and I would like to add to the assembly as a "new" chromosome (so I will have all the murine chromosomes+the ribosomic one). Does anyone knows how to do it? In this paper, they claim they can do that using Bowtie build,

https://www.ncbi.nlm.nih.gov/pubmed/21355038

Although I do not know if there is a easier way, as they do not explain it very much.

This assembly will be use to map genomic data from a ChIP-Seq.

Thanks!

ChIP-Seq Assembly • 1.6k views
ADD COMMENT
0
Entering edit mode

I assume you are referring to the rDNA repeat? People generally try to avoid having that sequence (since we still don't know how many copies of rDNA repeats exist and exactly where in the genome) in alignments. Is there a specific reason you want to include it in your genome?

ADD REPLY
0
Entering edit mode

Thanks! I am interested in seeing if my samples bind (ChIP-Seq) and the status of the chromatin (ATAC-Seq) in this chrosomome, but I see how they do here in the paper and finally it was not very difficult.

ADD REPLY
0
Entering edit mode
5.8 years ago
h.mon 35k

The mm10 assembly does include rRNA in its sequence. Based on this post ( A: How To Find Comprehensive List Of Rrna Locations For C. Elegans? ), go to:

http://genome.ucsc.edu/cgi-bin/hgTables

Then select:

group: all tables
database: mm10
table: rmsk
filter: repClass = rRNA
output format: GTF
ADD COMMENT
0
Entering edit mode
5.7 years ago
michael.ante ★ 3.8k

I would not recommend adding rRNA sequences to your normal reference. These sequences are already in the mm10 reference as h.mon posted.

You could search e.g. NCBI nucleotide for the rRNA sequences in mouse and build a Bowtie index from these and map your data in a first step against it. Setting Bowtie in a way to save the unmapped reads separately, you'll get a nearly rRNA-free set of reads. Moreover, the mapping rate is your rRNA content.

The only way of having an integrated analysis, like you suggested, is to use the GTF file from h.mon's post for soft-masking your mm10 genome. On every position the repeat-masker model finds a rRNA sequence, you'll get a 'N'-sequence (e.g. using bedtools maskfasta). Adding the rRNA-fasta to your masked mm10 fasta, you should get only rRNA reads mapping to the rRNA "chromosomes".

I haven't tested the latter approach, thus I cannot guarantee it will work.

Cheers,

Michael

ADD COMMENT

Login before adding your answer.

Traffic: 1867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6