Question

appropriate script for DNA alignment

0

Entering edit mode

5.8 years ago

Sam ▴ 150

Dear Biostars

I have a fasta file with about 100 K DNA sequences with length about 300bp , and I want to align them to a target genome to have bam output file, could you introduce me an appropriate script for this alignment?

Thanks

alignment DNA • 1.4k views

ADD COMMENT • link 5.8 years ago by Sam ▴ 150

2

Entering edit mode

Just to give the context> prepare a GFF file for MOCK fasta reference is related and describes a little bit more how the reference is created.

You should really provide us with the context by yourself. Please take a few minutes to edit and revise your post to contain all the necessary information. Please be very specific and rephrase phrases like: "a target genome (which exactly? seemingly a synthetic reference)", "some GBS data (what is that?)", " a tree sample (which species of tree)", "some pipeline"... You might think that knowing the exact species, sequences and methods applied is not relevant to solving the problem, but that is absolutely not the case!

I suggest that we hold off a little until this is fixed.

ADD REPLY • link 5.8 years ago by Michael 54k

0

Entering edit mode

Hello Sam,

this is a very basic question. What have you tried so far? What problems are you facing?

fin swimmer

ADD REPLY • link 5.8 years ago by finswimmer 16k

0

Entering edit mode

already I tried bowtie2 with -X 400 -I 100 --very-sensitive but about 35 % of sequence could not match and I think the issue is the length of the sequence and seed region in alignment, what do you think?

ADD REPLY • link updated 5.8 years ago by h.mon 35k • written 5.8 years ago by Sam ▴ 150

1

Entering edit mode

I agree with finswimmer that this is a very basic question, and it seems you did not put a lot of effort into solving it yourself.

already I tried bowti2 with -X 400 -I 100 --very-sensitive but about 35 % of sequence could not match and I think the issue is the length of the sequence and seed region in alignment,

This information should have been in your initial question. Also, you should elaborate on how you obtain the data and which organism you are working on.

ADD REPLY • link 5.8 years ago by WouterDeCoster 47k

0

Entering edit mode

Also indicate why the data is in fasta format?

ADD REPLY • link 5.8 years ago by GenoMax 141k

0

Entering edit mode

it's a mock reference (which is created by merging of some GBS data of a tree sample) due to that is in fasta format.

ADD REPLY • link 5.8 years ago by Sam ▴ 150

0

Entering edit mode

Wait, is the reference genome created by merging GBS data? Or this merged GBS is the data you are trying to map?

What is the reference genome? What is the plant species?

ADD REPLY • link 5.8 years ago by h.mon 35k

0

Entering edit mode

in GBS analysis in some pipeline is possible to merge the GBS data of some samples to prepare a reference for SNP variant analysis. so I have a GBS created ref and I want to align it to a reference genome of Populus. I think here the issue is the long sequence in mock ref file because I lost about 35 % of sequence during alignment. how can I set the criteria for these long sequence to retrieve as much as possible?

ADD REPLY • link 5.8 years ago by Sam ▴ 150

0

Entering edit mode

maybe useful to others! I obtained better results with BWA MEM algorithm with default flags, I think the issue was the algorithm of alignment.

I enjoyed your company

ADD REPLY • link 5.8 years ago by Sam ▴ 150