IsoSeq full length de novo assembly using trinity
2
0
Entering edit mode
4.5 years ago

Hi I'm want to assemble isoseq full-length reads using Trinity the following trinity command. I have been getting the error below. Can anyone help me decipher what I'm doing wrong? less

My trinity command:

module load trinity/2.8.5
module load perl/5.24.1

Trinity --genome_guided_bam mastomys_sorted.bam --long_reads mastomys.ccs.fasta \
     --genome_guided_max_intron 100000 \
     --max_memory 35G --CPU 8

study species = Mastomys natalesis (The Natal multi-mammate mouse)

mastomys_sorted.bam = alignment file from Rsubread alignment using the Mouse genome (Mus muscullus)

mastomys.ccs.fasta = full-read conscesus file generated by isoseq 3

The error I'm getting:

Warning: didn't find at least 1000 BAM records properly ordered along a single scaffold... either the file contains few reads per scaffold or there may be a problem.

any assistance will be highly appreciated

humble regards

Charles

Assembly assembly rna-seq • 2.0k views
ADD COMMENT
1
Entering edit mode

Trinity is complaining mastomys_sorted.bam has too few mapped reads, and it can't be sure if the bam file is properly sorted by coordinate, or if there is an error somewhere. Do you have mapping statistics from this file?

ADD REPLY
0
Entering edit mode

I truly appreciate your response.

I used the sublong function in RSubread R packages to map the raw reads to the mouse genome and the following passage is all the function returned as a report and the bam file (mastomys.bam).

And I sorted the bam file using the samtools to generate the file mastomys_sorted.bam I'm trying to assembly using Trinity.

====== Subread long read mapping ======

Threads: 1
Input file: /ufrc/austin/bonginkosi.gumbi/mastomys/isoSeq/Trail3/align/mouse/genome/m54115_190305_094127.polished.lq.fastq
Output file: mastomys_alignment.BAM (BAM)
Index: mu_index

Index was loaded; the gap bewteen subreads is 1 bases
Processing 0-th read for task 10; used 1.0 minutes


All finished.

Total processed reads : 349
Mapped reads: 337 (96.6%)
Time: 1.1 minutes
  

Charles

ADD REPLY
2
Entering edit mode
4.5 years ago

I agree with h.mon. You should probably use tools specifically designed for long reads? TALON ( github, preprint ) or StringTie2 ( github, preprint ) seems to be other good options.

ADD COMMENT
1
Entering edit mode
4.5 years ago
h.mon 35k

The error (warning, actually) you got is because Trinity expects to find at least 1000 mapped reads, and your bam file has 337 mapped reads.

But it seems you are trying to use only long reads with Trinity, which won't result in a good assembly - if you get an assembly at all - because Trinity is primarily a short read assembler, and short reads are required. If you have only long reads, follow the recommended Iso-Seq pipeline with SMRTLink software from PacBio.

ADD COMMENT

Login before adding your answer.

Traffic: 3832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6