Question

Align short reads to multifasta reference - cant see in IGV

0

Entering edit mode

7.2 years ago

kanacska ▴ 10

Hi everyone!

I've done a short read alignment on multifasta reference sequence. I'm working with windows... The multifasta reference contains two sequences NG_005905, NG_012772. And I've put the two fasta files in one (called it multiref.fasta), because I wanted to do a multi alignment

So then I did the steps:

built bowtie index from ref multifasta file ( bowtie2-build)
aligned the read files (unpaired) (bowtie2-align) - got the sam file (the alignment rate was 99.33%)
sam > bam (samtools)
sort bam (samtools) - bam
sorted_bam index file (samtools) - bai

Then I tried to view it in IGV: First it said: does not contain any sequence names which match the current genome. Then I tried to open it from 'genomes' and it showed nothing.

So my questions are: 1. Did I do it wrong that I wanted to do the multi align by putting the two fasta formatted sequence in one fasta file?
2. Is the header of the sequences in fasta file wrong? So the IGV doesn't recognize NG_...

>NG_005905.2 Homo sapiens BRCA1, DNA repair associated (BRCA1), RefSeqGene (LRG_292) on chromosome 17
>NG_012772.3 Homo sapiens BRCA2, DNA repair associated (BRCA2), RefSeqGene (LRG_293) on chromosome 13

3. unpaired reads name is the same like for example(first two line): exampl1.fastaq

@Frag_1 chr17 (Strand + Offset 106709--107175) 467M 101M
GAAGCCTGAGAATAATGACATTTGAGCCAATCTGCAGAGGTAAGTGAGTCCATAAAAGAAACTGAGGCTGGGCCTAGT
GGCTCACACCTGTAATCCTAGCA

exampl2.fastaq

@Frag_1 chr17 (Strand + Offset 106709--107175) 467M 101M
AGGCAGGTCTCAAACTCCTGACCTCAGGTGATCCACCCACCTCAAGCCTCCCAAAGTGCTGGGATTATAGGCATGAGC
CACCATGTCCGGCAAGTTTCTTT

Thank you for your answers! Best regards,

Anna

gene alignment IGV next-gen sequencing • 3.4k views

ADD COMMENT • link updated 7.2 years ago by igor 13k • written 7.2 years ago by kanacska ▴ 10

0

Entering edit mode

First import your multiref.fasta as a genome and only then load your bam file.

ADD REPLY • link 7.2 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

Thank you, that helped too:)

ADD REPLY • link 7.2 years ago by kanacska ▴ 10

score 2 · Accepted Answer · 2017-02-07

2

Entering edit mode

7.2 years ago

GenoMax 141k

It is not good to have spaces in the fasta header ID's. Many programs tend to drop whatever is there after the first space (which is sort of kinda following fasta format). This invariably leads to ID's not matching in your reference and alignments, which means IGV can't connect the dots where needed.

Look at samtools view your_sorted.bam | more and see if the chromosome names stop at first space (e.g. NG_005905.2). If that is the case you can modify your fasta file to just keep those numbers and use that as reference in IGV. See if that fixes things.

ADD COMMENT • link 7.2 years ago by GenoMax 141k

0

Entering edit mode

Thank you for your answers! It worked:)

ADD REPLY • link 7.2 years ago by kanacska ▴ 10

0

Entering edit mode

Go ahead and accept the answer (green check mark) if that is the case. That provides closure to the question.

ADD REPLY • link 7.2 years ago by GenoMax 141k

score 1 · Accepted Answer · 2017-02-07

1

Entering edit mode

7.2 years ago

igor 13k

You can have multi-sequence FASTA files. Before you view any alignments in IGV, you have to add that FASTA file as a genome. Make sure you have the correct genome selected in IGV (top left corner) when you load the BAMs.
Your FASTA file sequence names contain spaces. That will cause problems with some tools.

Also, you mention you have .fastaq files, which is not a standard format. They should be either fasta or fastq.