Forum: how to map de novo assembly to reference genome ?
1
gravatar for Farbod
2.6 years ago by
Farbod3.2k
Toronto
Farbod3.2k wrote:

Dear friends, Hi

I want to map my de novo transcriptome assembly to reference genome using BLAT or GMAP. Then, look at the distribution of intron lengths that can infer from those alignments.

The main story is this that the Trinity software needs a "--genome_guided_max_intron" parameter for its genome guided and its manual has suggested that "use a maximum intron length that makes most sense given your targeted organism"

So, I need your helps about the script(s) for mapping de novo assembly to genome : do I must indexing the genome ? do I must install the BLAT same as locan ncbi BLAST ?

Thank you in advance

rna-seq alignment forum assembly • 2.0k views
ADD COMMENTlink written 2.6 years ago by Farbod3.2k
3

Hi, Like you said,make use of GMAP and map the transcriptome to your refernece geneome and later you will obtain the gff3 file which has the eixon location of transcript within the scaffold/contig. Make use of that information to compute the intron length.

The other option is to make use of tool called "GAG" where you have to provide the Genome(fasta file) and its GFF3 file(you can obtain from GMAP) and it will tell the summary stats of genome features including min intron length, max intron length and mean intron length

ADD REPLYlink written 2.6 years ago by EVR510
1

Dear Tom, Hi. Very nice answer, thank you.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Farbod3.2k
1

I don't think the number has to be absolute. You can use a number that should fall in ballpark (one from zebrafish may be fine in this case).

ADD REPLYlink written 2.6 years ago by genomax65k
1

Dear genomax2, Hi

I have used "10000" that is written in The trinity website and the result was only about 500 transcripts but in the de novo assembly I have more than 500,000 transcripts!

So I think that this number must be very critical or the zebrafish and my species are very very distinct from each other.

Do you have any idea that what is this number (intron lengths) for Zebrafish?

ADD REPLYlink written 2.6 years ago by Farbod3.2k
1

According to this paper that number may need to be ~1000 for zebrafish.

You have predictions for half a billion transcripts. There is no independent evidence that they are real, as yet.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by genomax65k
1

Thank you for the paper you have provided, and the time you have spent.

I really appreciate that.

ADD REPLYlink written 2.6 years ago by Farbod3.2k
1

Dear Genomax2,

In the table1 of your paper, the "maximum intron size" is about 378,145 for zebrafish but you have siggested ~1000, is there any miss-understanding by me?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Farbod3.2k
1

Mean intron length is ~3000 and median is ~1000 (378K is an outlier). You could try running with a couple of different values (1000 and 3000).

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by genomax65k
1

My Dear Friend, Genomax, Hi.

I have used the Trinity genome guided approach with different "maximum intron size"(s) and the number of genes or better to say, transcripts in the result fasta file was as below:

maximun intron size ....................................... No. of transcripts

378145 ..........................................................568

3000 ............................................................. 567

1000 ............................................................. 566

10000 ........................................................... 567

De novo assembly ......................................... ~ 600,000 transcripts !

Do you have any idea about these results?

my fish was a sturgeon and I have mapped its reads with zebrafish genome (using STAR) as there was not any close genome to my species.

ADD REPLYlink modified 11 months ago • written 2.6 years ago by Farbod3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1512 users visited in the last hour