weird alignment results for RNAseq data
0
0
Entering edit mode
2.0 years ago
Sara ▴ 240

I have RNAseq data for 2 samples and trying to do paired end alignment. I tried hisat2 and star for the alignment but the results are weird to me. First I used star (using the command that I used many times before) and sam file seems to be normal but when I converted to bam file and loaded it to IGV there was no read. Looking at the bam file also I did not find any reads. Then I tried hisat2 and I got the same problem for the bam file. Do you know what could be the problem. Here is the stat from hisat2:

14631020 reads; of these:   14631020 (100.00%) were paired; of these:
    424695 (2.90%) aligned concordantly 0 times
    13320596 (91.04%) aligned concordantly exactly 1 time
    885729 (6.05%) aligned concordantly >1 times
    ----
    424695 pairs aligned concordantly 0 times; of these:
      11156 (2.63%) aligned discordantly 1 time
    ----
    413539 pairs aligned 0 times concordantly or discordantly; of these:
      827078 mates make up the pairs; of these:
        412619 (49.89%) aligned 0 times
        380325 (45.98%) aligned exactly 1 time
        34134 (4.13%) aligned >1 times
98.59% overall alignment rate

here is the flagstat results:

samtools flagstat sample.bam
33062668 + 0 in total (QC-passed reads + QC-failed reads)
3800628 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
32650049 + 0 mapped (98.75% : N/A)
29262040 + 0 paired in sequencing
14631020 + 0 read1
14631020 + 0 read2
28412650 + 0 properly paired (97.10% : N/A)
28508696 + 0 with itself and mate mapped
340725 + 0 singletons (1.16% : N/A)
62868 + 0 with mate mapped to a different chr
47800 + 0 with mate mapped to a different chr (mapQ>=5)



samtools flagstat sample.sam
33062668 + 0 in total (QC-passed reads + QC-failed reads)
3800628 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
32650049 + 0 mapped (98.75% : N/A)
29262040 + 0 paired in sequencing
14631020 + 0 read1
14631020 + 0 read2
28412650 + 0 properly paired (97.10% : N/A)
28508696 + 0 with itself and mate mapped
340725 + 0 singletons (1.16% : N/A)
62868 + 0 with mate mapped to a different chr
47800 + 0 with mate mapped to a different chr (mapQ>=5)
hisat2 star • 1.2k views
ADD COMMENT
0
Entering edit mode

What command did you use to convert from sam to bam ?

ADD REPLY
0
Entering edit mode

@Carlo Yague : I tried 2 commands . once:

1- samtools view -bS  sample.sam  > sample.bam
2-samtools view -u sample.sam | samtools sort -o sample_sorted.bam

in both cases I got weird bam file

ADD REPLY
0
Entering edit mode

Always use -o to save the output, depending on how you run the command, the output might be mangled with other output. Try this:

samtools view -bh -o sample.bam sample.sam
ADD REPLY
0
Entering edit mode

There might have been an error during conversion to bam, but that could be almost anything, from a broken library to a full harddisk. Do some more very simple checks, f.e. what do samtools view and flagstats yield? Technically, bam files might be in gzip format, so try gzip -t, may yield a CRC error. What is the size and type of the bam file using ls and file? Is your disk full (not a joke, I assume this is the most common source of error: df -h .). What are the chromosome names in the bam file, etc.?

ADD REPLY
0
Entering edit mode

Michael Dondrup question is updated.

ADD REPLY
0
Entering edit mode

probably it's a mismatch of contig/chromosome names between ref used in IGV and ref used in bam. Genome version used in IGV should be same as genome version used in alignment.

ADD REPLY
0
Entering edit mode

cpad0112 1- I used the same genome version indeed. in addition to IGV, I also looked into the bam file. that also does not have reads information!

ADD REPLY
1
Entering edit mode

Please do the basic sanity checks first, the file might simply be truncated.

ADD REPLY
0
Entering edit mode

can you check file sizes of sam and bam?

ADD REPLY
0
Entering edit mode

cpad0112 yes:

1605304 sample.bam   
18105768 sample.sam
ADD REPLY
0
Entering edit mode

can you try this:

$ samtools view sample.bam | wc
$ samtools view sample.sam | wc

or you can run samtools quickcheck to check the integrity of bam fille. If it is not still not working, can you post a subset bam here to see what's going on?

ADD REPLY
0
Entering edit mode

Hmm, looks good to me. So, there is seemingly nothing wrong with your file. Possibly, you may simply need to zoom in on a region in IGV where there are reads, and there might be regions that don't have any. Also, in IGV you have to zoom in to a certain level first before it shows anything. Also, the bam files should be sorted and indexed. After that, try to view the first few reads from the bam file, check where they are aligned and then zoom in to that position in the reference.

ADD REPLY

Login before adding your answer.

Traffic: 1933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6