Question: appropriate script for DNA alignment
0
gravatar for Sam
4 months ago by
Sam80
Sam80 wrote:

Dear Biostars

I have a fasta file with about 100 K DNA sequences with length about 300bp , and I want to align them to a target genome to have bam output file, could you introduce me an appropriate script for this alignment?

Thanks

dna alignment • 247 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by Sam80
2

Just to give the context> prepare a GFF file for MOCK fasta reference is related and describes a little bit more how the reference is created.

You should really provide us with the context by yourself. Please take a few minutes to edit and revise your post to contain all the necessary information. Please be very specific and rephrase phrases like: "a target genome (which exactly? seemingly a synthetic reference)", "some GBS data (what is that?)", " a tree sample (which species of tree)", "some pipeline"... You might think that knowing the exact species, sequences and methods applied is not relevant to solving the problem, but that is absolutely not the case!

I suggest that we hold off a little until this is fixed.

ADD REPLYlink written 4 months ago by Michael Dondrup45k

Hello Sam,

this is a very basic question. What have you tried so far? What problems are you facing?

fin swimmer

ADD REPLYlink written 4 months ago by finswimmer6.7k

already I tried bowtie2 with -X 400 -I 100 --very-sensitive but about 35 % of sequence could not match and I think the issue is the length of the sequence and seed region in alignment, what do you think?

ADD REPLYlink modified 4 months ago by h.mon21k • written 4 months ago by Sam80
1

I agree with finswimmer that this is a very basic question, and it seems you did not put a lot of effort into solving it yourself.

already I tried bowti2 with -X 400 -I 100 --very-sensitive but about 35 % of sequence could not match and I think the issue is the length of the sequence and seed region in alignment,

This information should have been in your initial question. Also, you should elaborate on how you obtain the data and which organism you are working on.

ADD REPLYlink written 4 months ago by WouterDeCoster34k

Also indicate why the data is in fasta format?

ADD REPLYlink written 4 months ago by genomax58k

it's a mock reference (which is created by merging of some GBS data of a tree sample) due to that is in fasta format.

ADD REPLYlink modified 4 months ago • written 4 months ago by Sam80

Wait, is the reference genome created by merging GBS data? Or this merged GBS is the data you are trying to map?

What is the reference genome? What is the plant species?

ADD REPLYlink written 4 months ago by h.mon21k

in GBS analysis in some pipeline is possible to merge the GBS data of some samples to prepare a reference for SNP variant analysis. so I have a GBS created ref and I want to align it to a reference genome of Populus. I think here the issue is the long sequence in mock ref file because I lost about 35 % of sequence during alignment. how can I set the criteria for these long sequence to retrieve as much as possible?

ADD REPLYlink modified 4 months ago • written 4 months ago by Sam80

maybe useful to others! I obtained better results with BWA MEM algorithm with default flags, I think the issue was the algorithm of alignment.

I enjoyed your company

ADD REPLYlink written 4 months ago by Sam80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1446 users visited in the last hour