weird alignment results for RNAseq data
0
0
Entering edit mode
9 months ago
Sara ▴ 220

I have RNAseq data for 2 samples and trying to do paired end alignment. I tried hisat2 and star for the alignment but the results are weird to me. First I used star (using the command that I used many times before) and sam file seems to be normal but when I converted to bam file and loaded it to IGV there was no read. Looking at the bam file also I did not find any reads. Then I tried hisat2 and I got the same problem for the bam file. Do you know what could be the problem. Here is the stat from hisat2:

14631020 reads; of these:   14631020 (100.00%) were paired; of these:
424695 (2.90%) aligned concordantly 0 times
13320596 (91.04%) aligned concordantly exactly 1 time
885729 (6.05%) aligned concordantly >1 times
----
424695 pairs aligned concordantly 0 times; of these:
11156 (2.63%) aligned discordantly 1 time
----
413539 pairs aligned 0 times concordantly or discordantly; of these:
827078 mates make up the pairs; of these:
412619 (49.89%) aligned 0 times
380325 (45.98%) aligned exactly 1 time
34134 (4.13%) aligned >1 times
98.59% overall alignment rate

here is the flagstat results:

samtools flagstat sample.bam
3800628 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
32650049 + 0 mapped (98.75% : N/A)
29262040 + 0 paired in sequencing
28412650 + 0 properly paired (97.10% : N/A)
28508696 + 0 with itself and mate mapped
340725 + 0 singletons (1.16% : N/A)
62868 + 0 with mate mapped to a different chr
47800 + 0 with mate mapped to a different chr (mapQ>=5)

samtools flagstat sample.sam
3800628 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
32650049 + 0 mapped (98.75% : N/A)
29262040 + 0 paired in sequencing
28412650 + 0 properly paired (97.10% : N/A)
28508696 + 0 with itself and mate mapped
340725 + 0 singletons (1.16% : N/A)
62868 + 0 with mate mapped to a different chr
47800 + 0 with mate mapped to a different chr (mapQ>=5)
hisat2 star • 618 views
0
Entering edit mode

What command did you use to convert from sam to bam ?

0
Entering edit mode

@Carlo Yague : I tried 2 commands . once:

1- samtools view -bS  sample.sam  > sample.bam
2-samtools view -u sample.sam | samtools sort -o sample_sorted.bam

in both cases I got weird bam file

0
Entering edit mode

Always use -o to save the output, depending on how you run the command, the output might be mangled with other output. Try this:

samtools view -bh -o sample.bam sample.sam
0
Entering edit mode

There might have been an error during conversion to bam, but that could be almost anything, from a broken library to a full harddisk. Do some more very simple checks, f.e. what do samtools view and flagstats yield? Technically, bam files might be in gzip format, so try gzip -t, may yield a CRC error. What is the size and type of the bam file using ls and file? Is your disk full (not a joke, I assume this is the most common source of error: df -h .). What are the chromosome names in the bam file, etc.?

0
Entering edit mode

Michael Dondrup question is updated.

0
Entering edit mode

probably it's a mismatch of contig/chromosome names between ref used in IGV and ref used in bam. Genome version used in IGV should be same as genome version used in alignment.

0
Entering edit mode

cpad0112 1- I used the same genome version indeed. in addition to IGV, I also looked into the bam file. that also does not have reads information!

1
Entering edit mode

Please do the basic sanity checks first, the file might simply be truncated.

0
Entering edit mode

can you check file sizes of sam and bam?

0
Entering edit mode

1605304 sample.bam
18105768 sample.sam
0
Entering edit mode

can you try this:

$samtools view sample.bam | wc$ samtools view sample.sam | wc

or you can run samtools quickcheck to check the integrity of bam fille. If it is not still not working, can you post a subset bam here to see what's going on?

0
Entering edit mode

Hmm, looks good to me. So, there is seemingly nothing wrong with your file. Possibly, you may simply need to zoom in on a region in IGV where there are reads, and there might be regions that don't have any. Also, in IGV you have to zoom in to a certain level first before it shows anything. Also, the bam files should be sorted and indexed. After that, try to view the first few reads from the bam file, check where they are aligned and then zoom in to that position in the reference.