Question: HISAT output SAM file incorrect (only contains 3 columns of data)
gravatar for emcc
3 months ago by
emcc10 wrote:

Hello, I have ran into to various problems in the HISAT-StringTie pipeline and have traced it back to incomplete SAM file output from the initial mapping. I generated the indicies myself (job out complete and gave 8 .ht2 files of binary content). My read files are called in .gz archive file format and contain reads in the expected format. The job "successfully" completes (output below) but my out sam files look like this (the entire file is in this format- I have also ran this minus the ERCC seqs and get the same output):

@HD     VN:1.0  SO:unsorted
@SQ     SN:ERCC-00002   LN:1061
@SQ     SN:ERCC-00003   LN:1023
@SQ     SN:ERCC-00004   LN:523
@SQ     SN:ERCC-00009   LN:984
@SQ     SN:ERCC-00012   LN:994
@SQ     SN:ERCC-00013   LN:808
@SQ     SN:ERCC-00014   LN:1957
@SQ     SN:ERCC-00016   LN:844
@SQ     SN:ERCC-00017   LN:1136
@SQ     SN:ERCC-00019   LN:644

I cannot find a similar problem but perhaps you might be able to point me in the right direction?

Thanks in advance

I'm running the command:

 hisat2 -p 8 --dta -x indexes/genome_ERCC92_tran -1 readfiles/readfile_r1.fastq.gz
    -2 readfiles/readfile_r2.fastq.gz
    -U readfiles/readfile_r0.fastq.gz
    -S 21divv1.sam

with memory request

#$ -pe smp-verbose 8
#$ -l h_vmem=125G
#$ -l himem=true

My job file shows:

182087500 reads; of these:
  171955418 (94.44%) were paired; of these:
    45539147 (26.48%) aligned concordantly 0 times
    93921035 (54.62%) aligned concordantly exactly 1 time
    32495236 (18.90%) aligned concordantly >1 times
    45539147 pairs aligned concordantly 0 times; of these:
      2017830 (4.43%) aligned discordantly 1 time
    43521317 pairs aligned 0 times concordantly or discordantly; of these:
      87042634 mates make up the pairs; of these:
        77335867 (88.85%) aligned 0 times
        6644882 (7.63%) aligned exactly 1 time
        3061885 (3.52%) aligned >1 times
  10132082 (5.56%) were unpaired; of these:
    2937197 (28.99%) aligned 0 times
    4468634 (44.10%) aligned exactly 1 time
    2726251 (26.91%) aligned >1 times
77.33% overall alignment rate
sam hisat files • 154 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by emcc10
gravatar for michael.ante
3 months ago by
michael.ante2.5k wrote:

I think Sej is pointing in the right direction: you are seeing the sam file's header; which can be quite long.

You can use samtools view -S 21divv1.sam to see only the reads.

ADD COMMENTlink written 3 months ago by michael.ante2.5k

Apologies, I clearly didn't scroll for long enough. I have the expected content further down the file. Thank you so much for your time- I'm new to this and I really appreciate the help.

ADD REPLYlink written 3 months ago by emcc10
gravatar for Sej Modha
3 months ago by
Sej Modha2.9k
Glasgow, UK
Sej Modha2.9k wrote:

Why are you using -1 -2 (parameters for paired data) and -U (parameter for unpaired data) together?

hisat2 [options]* -x <hisat2-idx> {-1 <m1> -2 <m2> | -U <r> | --sra-acc <SRA accession number>} [-S <hit>]

Please post last 10 lines of SAM file using tail filename.sam .

ADD COMMENTlink modified 3 months ago • written 3 months ago by Sej Modha2.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 988 users visited in the last hour