Guppy Aligner/Samtools -- Parse error in SAM file
1
0
Entering edit mode
7 months ago
camppatrick ▴ 10

Hello All,

Am working on mapping some Nanopore RNA reads to rat reference. I have been using minimap2 and guppy_aligner to try and accomplish this, but have unfortunately had many errors with both programs. One I am wondering is the Parse error in the SAM file that is output from:

./guppy_aligner -I <input/path/ (.fastq)> -s <output/path/ (.sam)> --align_ref ./Rat/NCBI_Transcriptome/GCF_015227675.2_mRatBN7.2_rna.fa --align_type auto

Can anyone enlighten me as to why I am getting this error? I have also used the transcriptome .fa file from NCBI and get a similar error, where the first non-header line output of samtools view -h <output.sam> is not parseable.

Also, if you happen to know which rat reference file to use with minimap2 or guppy aligner, alongside the settings you use, that would be the best headache relief medicine.

Nanopore Samtools Mapping • 599 views
0
Entering edit mode

Can you clarify which errors you had with minimap2, and the command you used there?

1
Entering edit mode
7 months ago

It looks to me like you're using an rna reference (but does your rat rna fasta contain DNA bases, or RNA - I would guess DNA). Are the reads direct RNA, not DNA ? This might cause some problems.

I've only ever aligned DNA/cDNA vs genomic references when looking at RNA-seq.

Maybe initially you can try alignment vs the rat genome, not transcriptome, and see if that works. Then investigate the direct RNA information more later.

edit : you can also use

tail *.sam


to find out which lines of your SAM are broken. Maybe the last lines in your FASTQ are somehow corrupt (eg quality lines not same length as sequence lines).

0
Entering edit mode

Thanks. I think this was exactly the problem! I switched to the ncbi mRatBN7.2_genomic.fna as well as the ensembl Rnor_6.0.cdna.fa and minimap2 mapped it and I could also view some of the alignments in IGV for the ncbi one. I am also analyzing cDNA from mRNA.

Regarding the genomic mRat alignment, is there anyway to get the gene/protein names linked up with the output file instead of location names such as: NCXXXXX.X --> gene_1 ?

Also, would anyone happen to know which recommended setting for the minimap2 alignment should be used when going against the cdna.fa from Ensembl? I used, -ax splice

Thanks a bunch already though :).

fyi: The code I ran that "resolved" this was:

./minimap2 -ax splice <mRatBN7.2.fna.gz> <input.fastq> > <output.sam>


and

./minimap2 -ax splice <Rnor_6.0.cdna.all.fa.gz> <input.fastq> > <output.sam>

1
Entering edit mode

Regarding genomic alignments, I think you want to use the classic read counters such as featureCounts or htseq-count or something in R to get counts per gene. You'll need the gff3 or gtf annotation of course.

Surely you don't want to use splice if aligning vs cDNA - where splice sites should not be present? Not used this mode in minimap2 though. Splice should be just for genomic aligns AFAIK.