Question

Where are my reads in IGV?

0

Entering edit mode

5.9 years ago

francois ▴ 80

I have a .bam file generated by the aligner minimap2 (included in the EPI2ME online service from Oxford Nanopore; I am using the MinION for sequencing).

I sorted it and indexed it using samtools:

samtools sort align.bam

samtools index sort.bam

I then dragged and dropped sort.bam in IGV. It seemed to load it fine, but I cannot see any of my reads... They are supposed to mostly align to one gene, I can find the reference gene, but no sign of any reads there.

Additional troubleshooting I have tried so far:

I think minimap2 is using hg38 while the default reference in IGV is hg19. So I also tried loading the hg38 in IGV as reference (Genomes > Load genomes from server), but still cannot see anything in the expected region.
I have converted my .bam file in .sam (with: samtools view test.bam -o test.sam) so I could scroll through it. I think all looks good, and most of the reads do mention my gene of interest. Does the .sam file include any chr:position I could use to figure out where to look in IGV?

Any ideas?

igv alignment • 3.8k views

ADD COMMENT • link 5.9 years ago by francois ▴ 80

2

Entering edit mode

did you try to zoom into one region where the reads are mapped ? did you check some reads are mapped in your gene ? with samtools view in.bam "chrom:start-end" ?

ADD REPLY • link 5.9 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Yes I did try zooming in the position where the reads should be.

And I tried (the gene is around chr20:4,700,000):

samtools view sort.bam chr20:4,000,000-5,000,000

But got:

[main_samview] region "chr20:4,000,000-5,000,000" specifies an unknown reference name. Continue anyway.

Any idea what is happening?

Also, is there a command to output a list of all the positions on the reference genome where my reads are mapped? I looked quickly in the samtools manual, but could not find anything. I think it'd be useful to understand what is happening in my case, as I do not have a ton of reads and most of them should map to a small 1kb region of the genome (around chr20:4,700,000).

ADD REPLY • link 5.9 years ago by francois ▴ 80

0

Entering edit mode

Could you show us the output of samtools idxstats of your sorted bam file?

ADD REPLY • link 5.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Here it is.

samtools idxstats sort.bam

A1BG    8322    0   0
A1CF    86267   0   0
A2M     48566   0   0
A2ML1   64530   0   0
A3GALT2 14333   0   0
A4GALT  29178   0   0
A4GNT   8670    0   0
AAAS    17409   0   0
AACS    77955   0   0

And so on.

PRNP, which is the gene to which most of my reads should align to, is:

PRNP    15355   908 0

Which is the correct number of reads aligned to the gene (it is reported by EPI2ME/minimap2, the online alignment service Nanopore is providing).

ADD REPLY • link 5.9 years ago by francois ▴ 80

score 3 · Answer 1 · 2018-06-16

3

Entering edit mode

5.9 years ago

WouterDeCoster 47k

Your reads are aligned to the transcriptome, and not to the genome.

I think minimap2 is using hg38

minimap2 uses whatever genome you give it, so in this case, it seems EPI2ME select a transcriptome reference which to me sounds weird. You may want to enquire about this.

ADD COMMENT • link 5.9 years ago by WouterDeCoster 47k

0

Entering edit mode

I am sorry, not sure I understand, why are you saying it is using a transcriptome reference?

The website says:

The reference sequences are taken from the feature strand of the most recent human genome assembly, GRCh38.

ADD REPLY • link 5.9 years ago by francois ▴ 80

1

Entering edit mode

When I run samtools idxstats on a bam file created by minimap2 alignment to GRCh38 aka hg38 then I get the following, showing me the number of reads per chromosome

chr1    248956422   1950807 0
chr2    242193529   1457287 0
chr3    198295559   1166894 0
chr4    190214555   1224099 0
chr5    181538259   1060618 0
chr6    170805979   975801  0
chr7    159345973   1000378 0

That is because the fasta file used for alignment contains one fasta record per chromosome. In your case it appears it contains one fasta record per gene, so therefore your reads are aligned to the transcriptome reference and not to the genome reference.

That also matches with "the feature strand of the most recent human genome assembly".

Did you use the "Human Exome Mapper" workflow? I have never used Epi2Me so I'm not entirely familiar with it, I do my alignments myself.

ADD REPLY • link 5.9 years ago by WouterDeCoster 47k