Question

filtering alignned seqeunces

0

Entering edit mode

3.3 years ago

priscilladraj ▴ 10

Hi, I have used ONT minimap to align combined reference sequences (fasta file combined together in one file but separated by header " >" for each reference sequence). Then, I got one aligned sam file . My question is I dont know how to filter out my aligned data to each of the reference sequence. Details are below: I have 2 reference sequences in fasta format (lets call it A and B) And one aligned data in sam format. I want to see how many sequences align to each referenec DNA sequence (A and B)

Do I use the awk comand

alignment • 862 views

ADD COMMENT • link updated 3.3 years ago by colindaven 6.4k • written 3.3 years ago by priscilladraj ▴ 10

0

Entering edit mode

I tried this

awk '{if($3 == "A.fasta") { print $0 }}' combineA_aln.sam > A_aln_reads.file

wc -l A_aln_reads.file

to pull out A

awk '{if($3 == "B.fasta") { print $0 }}' combineB_aln.sam > B_aln_reads.file

wc -l B_aln_reads.file

to pull out B

but its not working. Can I know what is wrong. I am really new to comand lines in Linux

ADD REPLY • link updated 3.3 years ago by lieven.sterck 15k • written 3.3 years ago by priscilladraj ▴ 10

0

Entering edit mode

can you add what is not working or why you think it's not working?

ADD REPLY • link 3.3 years ago by lieven.sterck 15k

score 1 · Answer 1 · 2021-01-07

1

Entering edit mode

3.3 years ago

colindaven 6.4k

A more standard way to do this would be

convert SAM to BAM with BAI index
run samtools idxstats x.bam
alternative: run samtools stats x.bam

This would give you the stats including number of reads aligned to each reference as a text file. Very useful if you have many references.

ADD COMMENT • link 3.3 years ago by colindaven 6.4k