HISAT output SAM file incorrect (only contains 3 columns of data)
2
0
Entering edit mode
6.1 years ago
emcc ▴ 30

Hello,

I have ran into to various problems in the HISAT-StringTie pipeline and have traced it back to incomplete SAM file output from the initial mapping. I generated the indicies myself (job out complete and gave 8 .ht2 files of binary content). My read files are called in .gz archive file format and contain reads in the expected format. The job "successfully" completes (output below) but my out sam files look like this (the entire file is in this format- I have also ran this minus the ERCC seqs and get the same output):

@HD     VN:1.0  SO:unsorted
@SQ     SN:ERCC-00002   LN:1061
@SQ     SN:ERCC-00003   LN:1023
@SQ     SN:ERCC-00004   LN:523
@SQ     SN:ERCC-00009   LN:984
@SQ     SN:ERCC-00012   LN:994
@SQ     SN:ERCC-00013   LN:808
@SQ     SN:ERCC-00014   LN:1957
@SQ     SN:ERCC-00016   LN:844
@SQ     SN:ERCC-00017   LN:1136
@SQ     SN:ERCC-00019   LN:644

I cannot find a similar problem but perhaps you might be able to point me in the right direction?

Thanks in advance

I'm running the command:

hisat2 -p 8 --dta -x indexes/genome_ERCC92_tran -1 readfiles/readfile_r1.fastq.gz
    -2 readfiles/readfile_r2.fastq.gz
    -U readfiles/readfile_r0.fastq.gz
    -S 21divv1.sam

with memory request

#$ -pe smp-verbose 8
#$ -l h_vmem=125G
#$ -l himem=true

My job file shows:

182087500 reads; of these:
  171955418 (94.44%) were paired; of these:
    45539147 (26.48%) aligned concordantly 0 times
    93921035 (54.62%) aligned concordantly exactly 1 time
    32495236 (18.90%) aligned concordantly >1 times
    ----
    45539147 pairs aligned concordantly 0 times; of these:
      2017830 (4.43%) aligned discordantly 1 time
    ----
    43521317 pairs aligned 0 times concordantly or discordantly; of these:
      87042634 mates make up the pairs; of these:
        77335867 (88.85%) aligned 0 times
        6644882 (7.63%) aligned exactly 1 time
        3061885 (3.52%) aligned >1 times
  10132082 (5.56%) were unpaired; of these:
    2937197 (28.99%) aligned 0 times
    4468634 (44.10%) aligned exactly 1 time
    2726251 (26.91%) aligned >1 times
77.33% overall alignment rate
HISAT SAM • 2.0k views
ADD COMMENT
4
Entering edit mode
6.1 years ago
michael.ante ★ 3.8k

I think Sej is pointing in the right direction: you are seeing the sam file's header; which can be quite long.

You can use samtools view -S 21divv1.sam to see only the reads.

ADD COMMENT
1
Entering edit mode

Apologies, I clearly didn't scroll for long enough. I have the expected content further down the file. Thank you so much for your time- I'm new to this and I really appreciate the help.

ADD REPLY
3
Entering edit mode
6.1 years ago
Sej Modha 5.3k

Why are you using -1 -2 (parameters for paired data) and -U (parameter for unpaired data) together?

hisat2 [options]* -x <hisat2-idx> {-1 <m1> -2 <m2> | -U <r> | --sra-acc <SRA accession number>} [-S <hit>]

https://ccb.jhu.edu/software/hisat2/manual.shtml

Please post last 10 lines of SAM file using tail filename.sam .

ADD COMMENT

Login before adding your answer.

Traffic: 2567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6