Question

Generating Multiple Species Alignment Of Novel Transcripts For Phylocsf

3

Entering edit mode

12.3 years ago

Eric Fournier ★ 1.4k

Short version: How would you go about generating multiple species alignments of novel transcripts from bos taurus (assembly UMD3.1) with human/mouse/dog for use with PhyloCSF?

Context and what I've tried so far:

Through a sequencing experiment, our lab has identified a large set of new transcripts in Bos taurus. We want to determine if those transcripts are coding or non-coding. To do so, we thought of using the PhyloCSF software, as was done in Cabili et al, 2011.

To use PhyloCSF, we need to generate a multiple species alignment of our transcripts. However, since these are unknown transcripts, it is impossible to find their inter-species homologs directly. Instead, we set out to generate genomic multi-species alignment, from which we aimed to extract our regions of interest.

However, I've now spent most of the week banging my head trying to figure out the best way to do this. So far I have:

Obtained pairwise alignment of the bosTau6 assembly of the cow genome to hg19, mm9 and canFam2 from the UCSC Genome Browser download page
Converted those to MAF format
Stitched those pairwise MAF together using TBA
Tried to extract regions of interest using bx_python

Currently, bx_python crashes when I try to index my MAF file saying it cannot fit a range of 0..148,823,899 into a bin of 4681. This is the length of a whole chromosome, so I'm guessing my MAF file must be broken. wc -L gives me a maximum line size of 158,337,101, which I am pretty sure isn't normal.

I'm going to keep trying to figure out where I went wrong, but I would be grateful for any suggestions of alternative data sources, tools or pipelines for generating my multi species alignments.

multiple genome non • 6.7k views

ADD COMMENT • link written 12.3 years ago by Eric Fournier ★ 1.4k

score 2 · Answer 1 · 2012-01-20

2

Entering edit mode

12.2 years ago

Repineme ▴ 120

Get FASTA sequence of your genomic regions from Galaxy and use stitchMAFblocks function to extract 49 mammals MAFs. Then you can use phyloCSF to process the data. But there is a catch. There are few species name typos in phyloCSF and will throw errors. You can correct them easily.

Otherwise you can simply use infamous CPC http://cpc.cbi.pku.edu.cn.

ADD COMMENT • link 12.2 years ago by Repineme ▴ 120

0

Entering edit mode

"Infamous" CPC? What's the story there?

ADD REPLY • link 12.2 years ago by Eric Fournier ★ 1.4k

0

Entering edit mode

Do you guys know how to prepare the multiple alignment now? If you know, please tell me. Thanks a lot!

ADD REPLY • link 11.4 years ago by jgwang • 0

0

Entering edit mode

This tool doesn't do multiple alignment, however it blasts your genomic region to the known regions in BLAST database and then calculates the ORF and gives a score the separates noncoding from coding regions.

ADD REPLY • link 12.2 years ago by Repineme ▴ 120

0

Entering edit mode

Do you guys know how to prepare the multiple alignment now? If you know, please tell me. Thanks a lot!

ADD REPLY • link 11.4 years ago by jgwang • 0

score 0 · Answer 2 · 2012-12-04

0

Entering edit mode

11.4 years ago

jgwang • 0

Do you guys know how to prepare the multiple alignment now? If you know, please tell me. Thanks a lot!

ADD COMMENT • link 11.4 years ago by jgwang • 0

1

Entering edit mode

Here's what I ended up doing, in a nutshell: 1. Get species-to-species alignments from the UCSC Genome Browser (Cow to mouse, Cow to Dog, Cow to human, etc.) in axt format. 2. Convert from axt to MAF format using the axtToMaf tool from the kent source tree (Again, from the UCSC Genome Browser) 3. Split all alignments into their chromosome parts and fix up the sequence names so they fit the expected format for Multiz 4. Run multiz iteratively to stitch the alignments together 5. Extracted intervals using bx-python

It ended up being painful and complicated, but the results were enough for my ends. If you want, I can supply you with the scripts I used so you can get a better idea of what I did exactly.

ADD REPLY • link 11.4 years ago by Eric Fournier ★ 1.4k

0

Entering edit mode

Thanks for your reply. But I have my own transcripts. Do you think I should construct my own pairwise alignment (transcript vs genome) ? I'd like your scripts you used. My email is jgwang@mix.wvu.edu Thank you very much

ADD REPLY • link 11.4 years ago by jgwang • 0

0

Entering edit mode

Hello,

Can you help me with your scripts please ? Do you have used transcripts the reads as input

Thanks in advance

ADD REPLY • link 2.4 years ago by Nobody ▴ 30

score 0 · Answer 3 · 2022-01-23

0

Entering edit mode

2.2 years ago

Nobody ▴ 30

if you know how to prepare the multiple alignment , please help me i need it.

Thanks !

ADD COMMENT • link 2.2 years ago by Nobody ▴ 30