Question: Where are my reads in IGV?
0
gravatar for francois
2.3 years ago by
francois10
francois10 wrote:

I have a .bam file generated by the aligner minimap2 (included in the EPI2ME online service from Oxford Nanopore; I am using the MinION for sequencing).

I sorted it and indexed it using samtools:

samtools sort align.bam

samtools index sort.bam

I then dragged and dropped sort.bam in IGV. It seemed to load it fine, but I cannot see any of my reads... They are supposed to mostly align to one gene, I can find the reference gene, but no sign of any reads there.

Additional troubleshooting I have tried so far:

  • I think minimap2 is using hg38 while the default reference in IGV is hg19. So I also tried loading the hg38 in IGV as reference (Genomes > Load genomes from server), but still cannot see anything in the expected region.
  • I have converted my .bam file in .sam (with: samtools view test.bam -o test.sam) so I could scroll through it. I think all looks good, and most of the reads do mention my gene of interest. Does the .sam file include any chr:position I could use to figure out where to look in IGV?

Any ideas?

igv alignment • 1.3k views
ADD COMMENTlink written 2.3 years ago by francois10
2

did you try to zoom into one region where the reads are mapped ? did you check some reads are mapped in your gene ? with samtools view in.bam "chrom:start-end" ?

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum131k

Yes I did try zooming in the position where the reads should be.

And I tried (the gene is around chr20:4,700,000):

samtools view sort.bam chr20:4,000,000-5,000,000

But got:

[main_samview] region "chr20:4,000,000-5,000,000" specifies an unknown reference name. Continue anyway.

Any idea what is happening?

Also, is there a command to output a list of all the positions on the reference genome where my reads are mapped? I looked quickly in the samtools manual, but could not find anything. I think it'd be useful to understand what is happening in my case, as I do not have a ton of reads and most of them should map to a small 1kb region of the genome (around chr20:4,700,000).

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by francois10

Could you show us the output of samtools idxstats of your sorted bam file?

ADD REPLYlink written 2.3 years ago by WouterDeCoster44k

Here it is.

samtools idxstats sort.bam

A1BG    8322    0   0
A1CF    86267   0   0
A2M     48566   0   0
A2ML1   64530   0   0
A3GALT2 14333   0   0
A4GALT  29178   0   0
A4GNT   8670    0   0
AAAS    17409   0   0
AACS    77955   0   0

And so on.

PRNP, which is the gene to which most of my reads should align to, is:

PRNP    15355   908 0

Which is the correct number of reads aligned to the gene (it is reported by EPI2ME/minimap2, the online alignment service Nanopore is providing).

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by francois10
3
gravatar for WouterDeCoster
2.3 years ago by
Belgium
WouterDeCoster44k wrote:

Your reads are aligned to the transcriptome, and not to the genome.

I think minimap2 is using hg38

minimap2 uses whatever genome you give it, so in this case, it seems EPI2ME select a transcriptome reference which to me sounds weird. You may want to enquire about this.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by WouterDeCoster44k

I am sorry, not sure I understand, why are you saying it is using a transcriptome reference?

The website says:

The reference sequences are taken from the feature strand of the most recent human genome assembly, GRCh38.

ADD REPLYlink written 2.3 years ago by francois10
1

When I run samtools idxstats on a bam file created by minimap2 alignment to GRCh38 aka hg38 then I get the following, showing me the number of reads per chromosome

chr1    248956422   1950807 0
chr2    242193529   1457287 0
chr3    198295559   1166894 0
chr4    190214555   1224099 0
chr5    181538259   1060618 0
chr6    170805979   975801  0
chr7    159345973   1000378 0

That is because the fasta file used for alignment contains one fasta record per chromosome. In your case it appears it contains one fasta record per gene, so therefore your reads are aligned to the transcriptome reference and not to the genome reference.

That also matches with "the feature strand of the most recent human genome assembly".

Did you use the "Human Exome Mapper" workflow? I have never used Epi2Me so I'm not entirely familiar with it, I do my alignments myself.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by WouterDeCoster44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1549 users visited in the last hour