Align short reads to multifasta reference - cant see in IGV
2
0
Entering edit mode
7.2 years ago
kanacska ▴ 10

Hi everyone!

I've done a short read alignment on multifasta reference sequence. I'm working with windows... The multifasta reference contains two sequences NG_005905, NG_012772. And I've put the two fasta files in one (called it multiref.fasta), because I wanted to do a multi alignment

So then I did the steps:

  1. built bowtie index from ref multifasta file ( bowtie2-build)
  2. aligned the read files (unpaired) (bowtie2-align) - got the sam file (the alignment rate was 99.33%)
  3. sam > bam (samtools)
  4. sort bam (samtools) - bam
  5. sorted_bam index file (samtools) - bai

Then I tried to view it in IGV: First it said: does not contain any sequence names which match the current genome. Then I tried to open it from 'genomes' and it showed nothing.

So my questions are: 1. Did I do it wrong that I wanted to do the multi align by putting the two fasta formatted sequence in one fasta file?
2. Is the header of the sequences in fasta file wrong? So the IGV doesn't recognize NG_...

>NG_005905.2 Homo sapiens BRCA1, DNA repair associated (BRCA1), RefSeqGene (LRG_292) on chromosome 17
>NG_012772.3 Homo sapiens BRCA2, DNA repair associated (BRCA2), RefSeqGene (LRG_293) on chromosome 13


3. unpaired reads name is the same like for example(first two line): exampl1.fastaq

@Frag_1 chr17 (Strand + Offset 106709--107175) 467M 101M
GAAGCCTGAGAATAATGACATTTGAGCCAATCTGCAGAGGTAAGTGAGTCCATAAAAGAAACTGAGGCTGGGCCTAGT
GGCTCACACCTGTAATCCTAGCA

exampl2.fastaq

@Frag_1 chr17 (Strand + Offset 106709--107175) 467M 101M
AGGCAGGTCTCAAACTCCTGACCTCAGGTGATCCACCCACCTCAAGCCTCCCAAAGTGCTGGGATTATAGGCATGAGC
CACCATGTCCGGCAAGTTTCTTT

Thank you for your answers! Best regards,

Anna

gene alignment IGV next-gen sequencing • 3.4k views
ADD COMMENT
0
Entering edit mode

First import your multiref.fasta as a genome and only then load your bam file.

ADD REPLY
0
Entering edit mode

Thank you, that helped too:)

ADD REPLY
2
Entering edit mode
7.2 years ago
GenoMax 141k

It is not good to have spaces in the fasta header ID's. Many programs tend to drop whatever is there after the first space (which is sort of kinda following fasta format). This invariably leads to ID's not matching in your reference and alignments, which means IGV can't connect the dots where needed.

Look at samtools view your_sorted.bam | more and see if the chromosome names stop at first space (e.g. NG_005905.2). If that is the case you can modify your fasta file to just keep those numbers and use that as reference in IGV. See if that fixes things.

ADD COMMENT
0
Entering edit mode

Thank you for your answers! It worked:)

ADD REPLY
0
Entering edit mode

Go ahead and accept the answer (green check mark) if that is the case. That provides closure to the question.

ADD REPLY
1
Entering edit mode
7.2 years ago
igor 13k
  1. You can have multi-sequence FASTA files. Before you view any alignments in IGV, you have to add that FASTA file as a genome. Make sure you have the correct genome selected in IGV (top left corner) when you load the BAMs.

  2. Your FASTA file sequence names contain spaces. That will cause problems with some tools.

Also, you mention you have .fastaq files, which is not a standard format. They should be either fasta or fastq.

ADD COMMENT
0
Entering edit mode

Thank you for your answer! I mistyped it, I have fastq files:)

ADD REPLY

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6