Question: generation of sequences (from bam) starting at a specific position
0
gravatar for curiousbiologist
2.3 years ago by
France
curiousbiologist40 wrote:

How can I generate sequences from a bam/sam file starting at a specific position and remove what is before this position? Thank you!

sequence alignment • 733 views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by curiousbiologist40
1

To me, it's unclear what you are asking for. Is this the same as alignment containing sequences from position a to b ? What do you want to obtain? "Sequences"? Is that a read/fasta/fastq/reference/variant...?

ADD REPLYlink written 2.3 years ago by WouterDeCoster37k

I would like to obtain a bam "cropped" (all my reads aligned and starting at a defined position)

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by curiousbiologist40
1

One of the options in this threads should do this: How to get the consensus sequence from a BAM alignment

ADD REPLYlink written 2.3 years ago by genomax63k

I would like to play with all the selected reads after, I'm not sure consensus is a good approach

ADD REPLYlink written 2.3 years ago by curiousbiologist40
0
gravatar for curiousbiologist
2.3 years ago by
France
curiousbiologist40 wrote:

I'm deeply sorry if my question was obscure: what I want to do is to get, from an alignment, reads without nucleic bases before position 30 of the reference sequence. Is it possible to crop a bam file?

ADD COMMENTlink written 2.3 years ago by curiousbiologist40

You could do something like (adjust the name of the "chromosome" in your alignment file as needed).

samtools view file_sorted.bam  "chr:30-N"| awk -F "\t" '{print "@"$1"\n"$10"\n+\n"$11}' > reads_before_30.fq
ADD REPLYlink written 2.3 years ago by genomax63k

Thank you for your answer! However, I got reads matching within the interval given, not cropped reads inside this interval

ADD REPLYlink written 2.3 years ago by curiousbiologist40

I am not aware of a tool that will do that automatically for you. You will need to use a custom script to do something that specific.

ADD REPLYlink written 2.3 years ago by genomax63k

Thank you for your answer, it means to me to have a return. My programming skills are not great, do you know a script that I can use as start basis to do my custom script?

ADD REPLYlink written 2.2 years ago by curiousbiologist40

Could you use an igv screenshot and a graphical program (e.g. MS paint) to clarify what you aim for?

ADD REPLYlink written 2.2 years ago by WouterDeCoster37k

curiousbiologist wants individual reads chopped so they start and end at a specific position i.e. nothing should extend to left or right of an interval a <--> b

ADD REPLYlink written 2.2 years ago by genomax63k

Okay, makes me wonder "why" OP would want that, but fine. This is not a straightforward question, requires modification of CIGAR, sequence, qualities, start,...

ADD REPLYlink written 2.2 years ago by WouterDeCoster37k

Yes you got it genomax2. I want to have several (and switchable) windows of reads from different samples in order to compare them using different score calculations; stats, entropy (shannon entropy score). If I haven't same size of reads pieces, my results will be misrepresented

ADD REPLYlink written 2.2 years ago by curiousbiologist40
0
gravatar for curiousbiologist
2.3 years ago by
France
curiousbiologist40 wrote:

would it be possible to resolve this problem using an awk script? I was thinking of an alignment, conversion to fasta (with gap or x for non-aligned bases) and then trimming using awk or fastx_trimmer? Maybe there is something easier? how do I get gap or 'x' for non-aligned bases before and after each reads?

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by curiousbiologist40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1188 users visited in the last hour