I would like to align reads from multiple samples on the same contigs from a de novo assembly. I am working on a non model species and I don't have any reference genome. Consequently, I have made a de novo assembly with reads from a sequence capture chip. The chip contains approximately 3000 genes and this is what I have assembled. I have a little bit more than 50000 contigs on which I would like to align Illumina paired-end reads (100 bp). Those reads are separated in 24 files containing approximately 30 million reads each. Each file represent an individual from one of 2 different populations and they are all individually tagged.
I know the genome of my species contains a lot of repeated sequences, so I decided to use BFAST to align the reads on the contigs. I am currently in step 3, which is the "match" argument (finding CALs). My problem is that at the end of the process of aligning the reads, I would like to have a file containing the contigs and the reads of ALL THE INDIVIDUALS aligned on these contigs.
My question is : how can I do this if I align the reads separately for each individual ? (I know BFAST works better with files that are not too big, i.e with a few million reads)
Will I have a separate alignment file for each individual ? Could I merge these files somewhere in the process ? Or at the end ?
The final goal would be to find SNPs and conduct different population genomics analyses with the results.
Can anybody help me with this ? I am trying to start working with these tools, but I am definitely not an expert in bio-info. :S
Thank you VERY MUCH !