RNASeq alignment visualisation
1
0
Entering edit mode
3.1 years ago
daewowo ▴ 80

I have run MagicBLAST on a de-novo assembled metagenomic RNASeq dataset (using MEGAHIT). The contigs were queried against a specific database of genomes.

Now I am stuck as to how to visualise the hit results from MagicBLAST (.sam file). I could write a python notebook to decode the MagicBLAST results (get nucleuotide start, stop and sequence for query against reference) and then plot as a colour-coded horizontal ~'bar' type plot showing reference vs contig overlap. But I expect there is already open source software that can do this.

IGV is a good example of what I am trying to do- but as far as I can tell you can only show one reference sequence. I want to plot multiple reference sequences (eg 10 - to save time) and for each show the contigs which overlap with each refseq. Is there an open source tool or script which already does this?

What are examples of workflows (on desktop linux, prefer local to web-based tools) once you have generated blast against ref sequences? (I am a novice)- aim is to identify genomic coverage of specific genes in metagenomic datasets.

A related question is what is the advantage of de-novo assembly vs alignment against specific sequences? I am struggling to work out what to do with the de-novo data

alignment RNA-Seq Assembly • 826 views
ADD COMMENT
0
Entering edit mode

One way I may be able to accomplish this is:

Blast the de-novo assembled to nt database, find the matching gene sequences. Then run bowtie2 to align the de-novo assembled contigs to the reference gene (and or bowtie2/bwa-mem on raw reads to align).

Is aligning de-novo built contigs back to a reference (once identified) common practice? Wondering if there are use cases when this would be preferable to just aligning the raw reads back to reference gene.

Then plot the alignments for each reference gene in igv.

ADD REPLY
0
Entering edit mode

Thanks GenoMax

Refseqs ae 10-30Kbp so yes quite long.

The tools you mention look like a good starting point :-)

ADD REPLY
2
Entering edit mode
3.1 years ago
GenoMax 141k

I want to plot multiple reference sequences (eg 10 - to save time)

What is the length of reference sequences? If they are long then programmatically/visually it may be impossible to fit them in a display window. IGV does this for long chromosomes where you need to zoom in before you can start to see the aligned reads. Since these are metagenomic assemblies you probably have hundreds of small references so there is no good solution for display.

You can get text based information about the (regions of) reference and how many reads they may be covered by other tools. Look at mosdepth (LINK) or bedtools genomecoverage for getting that information. You could do some plotting using those numbers yourself.

advantage of de-novo assembly vs alignment against specific sequences?

If you have no reference available then there is nothing to align against until you do an assembly to produce a longer representation. You can then use that as a reference for alignments of the individual reads used to generate that reference (you could align against a closely related genome, if you want). You would want to do this after an initial assembly to see the quality of your assembly. There are tools like QUAST (or MetaQUAST) that can produce quality information about your assembly.

If you have specific reference sequences available then a direct alignment would be possible.

ADD COMMENT

Login before adding your answer.

Traffic: 3788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6