Question: Align short reads to multifasta reference - cant see in IGV
0
gravatar for kanacska
2.2 years ago by
kanacska0
Hungary
kanacska0 wrote:

Hi everyone!

I've done a short read alignment on multifasta reference sequence. I'm working with windows... The multifasta reference contains two sequences NG_005905, NG_012772. And I've put the two fasta files in one (called it multiref.fasta), because I wanted to do a multi alignment

So then I did the steps:

  1. built bowtie index from ref multifasta file ( bowtie2-build)
  2. aligned the read files (unpaired) (bowtie2-align) - got the sam file (the alignment rate was 99.33%)
  3. sam > bam (samtools)
  4. sort bam (samtools) - bam
  5. sorted_bam index file (samtools) - bai

Then I tried to view it in IGV: First it said: does not contain any sequence names which match the current genome. Then I tried to open it from 'genomes' and it showed nothing.

So my questions are: 1. Did I do it wrong that I wanted to do the multi align by putting the two fasta formatted sequence in one fasta file?
2. Is the header of the sequences in fasta file wrong? So the IGV doesn't recognize NG_...

>NG_005905.2 Homo sapiens BRCA1, DNA repair associated (BRCA1), RefSeqGene (LRG_292) on chromosome 17
>NG_012772.3 Homo sapiens BRCA2, DNA repair associated (BRCA2), RefSeqGene (LRG_293) on chromosome 13


3. unpaired reads name is the same like for example(first two line): exampl1.fastaq

@Frag_1 chr17 (Strand + Offset 106709--107175) 467M 101M
GAAGCCTGAGAATAATGACATTTGAGCCAATCTGCAGAGGTAAGTGAGTCCATAAAAGAAACTGAGGCTGGGCCTAGT
GGCTCACACCTGTAATCCTAGCA

exampl2.fastaq

@Frag_1 chr17 (Strand + Offset 106709--107175) 467M 101M
AGGCAGGTCTCAAACTCCTGACCTCAGGTGATCCACCCACCTCAAGCCTCCCAAAGTGCTGGGATTATAGGCATGAGC
CACCATGTCCGGCAAGTTTCTTT

Thank you for your answers! Best regards,

Anna

ADD COMMENTlink modified 2.2 years ago by igor7.6k • written 2.2 years ago by kanacska0

First import your multiref.fasta as a genome and only then load your bam file.

ADD REPLYlink written 2.2 years ago by Biomonika (Noolean)3.0k

Thank you, that helped too:)

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by kanacska0
2
gravatar for genomax
2.2 years ago by
genomax65k
United States
genomax65k wrote:

It is not good to have spaces in the fasta header ID's. Many programs tend to drop whatever is there after the first space (which is sort of kinda following fasta format). This invariably leads to ID's not matching in your reference and alignments, which means IGV can't connect the dots where needed.

Look at samtools view your_sorted.bam | more and see if the chromosome names stop at first space (e.g. NG_005905.2). If that is the case you can modify your fasta file to just keep those numbers and use that as reference in IGV. See if that fixes things.

ADD COMMENTlink written 2.2 years ago by genomax65k

Thank you for your answers! It worked:)

ADD REPLYlink written 2.2 years ago by kanacska0

Go ahead and accept the answer (green check mark) if that is the case. That provides closure to the question.

ADD REPLYlink written 2.2 years ago by genomax65k
1
gravatar for igor
2.2 years ago by
igor7.6k
United States
igor7.6k wrote:
  1. You can have multi-sequence FASTA files. Before you view any alignments in IGV, you have to add that FASTA file as a genome. Make sure you have the correct genome selected in IGV (top left corner) when you load the BAMs.

  2. Your FASTA file sequence names contain spaces. That will cause problems with some tools.

Also, you mention you have .fastaq files, which is not a standard format. They should be either fasta or fastq.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by igor7.6k

Thank you for your answer! I mistyped it, I have fastq files:)

ADD REPLYlink written 2.2 years ago by kanacska0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 899 users visited in the last hour