generation of sequences (from bam) starting at a specific position
2
0
Entering edit mode
7.4 years ago

How can I generate sequences from a bam/sam file starting at a specific position and remove what is before this position? Thank you!

alignment sequence • 1.8k views
ADD COMMENT
1
Entering edit mode

To me, it's unclear what you are asking for. Is this the same as alignment containing sequences from position a to b ? What do you want to obtain? "Sequences"? Is that a read/fasta/fastq/reference/variant...?

ADD REPLY
0
Entering edit mode

I would like to obtain a bam "cropped" (all my reads aligned and starting at a defined position)

ADD REPLY
1
Entering edit mode

One of the options in this threads should do this: How to get the consensus sequence from a BAM alignment

ADD REPLY
0
Entering edit mode

I would like to play with all the selected reads after, I'm not sure consensus is a good approach

ADD REPLY
0
Entering edit mode
7.3 years ago

I'm deeply sorry if my question was obscure: what I want to do is to get, from an alignment, reads without nucleic bases before position 30 of the reference sequence. Is it possible to crop a bam file?

ADD COMMENT
0
Entering edit mode

You could do something like (adjust the name of the "chromosome" in your alignment file as needed).

samtools view file_sorted.bam  "chr:30-N"| awk -F "\t" '{print "@"$1"\n"$10"\n+\n"$11}' > reads_before_30.fq
ADD REPLY
0
Entering edit mode

Thank you for your answer! However, I got reads matching within the interval given, not cropped reads inside this interval

ADD REPLY
0
Entering edit mode

I am not aware of a tool that will do that automatically for you. You will need to use a custom script to do something that specific.

ADD REPLY
0
Entering edit mode

Thank you for your answer, it means to me to have a return. My programming skills are not great, do you know a script that I can use as start basis to do my custom script?

ADD REPLY
0
Entering edit mode

Could you use an igv screenshot and a graphical program (e.g. MS paint) to clarify what you aim for?

ADD REPLY
0
Entering edit mode

curiousbiologist wants individual reads chopped so they start and end at a specific position i.e. nothing should extend to left or right of an interval a <--> b

ADD REPLY
0
Entering edit mode

Okay, makes me wonder "why" OP would want that, but fine. This is not a straightforward question, requires modification of CIGAR, sequence, qualities, start,...

ADD REPLY
0
Entering edit mode

Yes you got it genomax2. I want to have several (and switchable) windows of reads from different samples in order to compare them using different score calculations; stats, entropy (shannon entropy score). If I haven't same size of reads pieces, my results will be misrepresented

ADD REPLY
0
Entering edit mode
7.3 years ago

would it be possible to resolve this problem using an awk script? I was thinking of an alignment, conversion to fasta (with gap or x for non-aligned bases) and then trimming using awk or fastx_trimmer? Maybe there is something easier? how do I get gap or 'x' for non-aligned bases before and after each reads?

ADD COMMENT

Login before adding your answer.

Traffic: 2520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6