Question: Align reads on multi-fasta (contigs) file?
0
gravatar for ThePresident
4 days ago by
ThePresident90
ThePresident90 wrote:

I have paired-end fastq files, however my reference genome is in a list of contigs. Thus, the referece fasta file looks something like this

>contig1
AGTGCAGAC.....
>contig2
GCGATCACA......
>contig3
....

Is there a way to instruct bwa to perform alignement on the 'contig1', then move on the 'contig2' and so on? Concatenating contrigs into one single fasta file is not an option as I'll have random pairing of contigs and paired-end reads might be mapped to different contigs producing all sorts of funny stuff.

I tried looking for this on previous posts but couldn't find anything similar.

TP

bwa • 94 views
ADD COMMENTlink written 4 days ago by ThePresident90
2

Concatenating contrigs into one single fasta file is not an option as I'll have random pairing of contigs

What do you mean by "random pairing of contigs"?

BWA works on multi-contig fasta files, in fact, most reference genomes are multi-contig fasta files. I don't understand what is the problem, or maybe I don't understand what you want to do.

ADD REPLYlink written 3 days ago by h.mon16k

What I meant by "random pairing of contigs" is that contigs probably won't be in the correct order (compared to the reference sequence), and some of them might be reversed as compared to the reference sequence i.e. the actual genome. When paired-end reads are aligned on these incorrectly joined contigs, they might produce discordant pairs (outward oriented reads and such). Hence, aligning independently on each contig instead on concatenated contigs.

I didn't know that bwa could work on multi-fasta files. Does indexing works the same way?

ADD REPLYlink modified 3 days ago • written 3 days ago by ThePresident90

When you concatenate several fasta sequences on a single multi-fasta file, all contigs remain separated from each other. If a pair of reads map on different contigs, it will result in a discordant pair regardless of the contig orientation. This is a by product of incomplete assemblies, and there is nothing you can do about it, except for getting a better assembly somehow.

ADD REPLYlink written 2 days ago by h.mon16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 735 users visited in the last hour