filtering alignned seqeunces
1
0
Entering edit mode
4 months ago

Hi, I have used ONT minimap to align combined reference sequences (fasta file combined together in one file but separated by header " >" for each reference sequence). Then, I got one aligned sam file . My question is I dont know how to filter out my aligned data to each of the reference sequence. Details are below: I have 2 reference sequences in fasta format (lets call it A and B) And one aligned data in sam format. I want to see how many sequences align to each referenec DNA sequence (A and B)

Do I use the awk comand

alignment • 163 views
ADD COMMENT
0
Entering edit mode

I tried this

awk '{if($3 == "A.fasta") { print $0 }}' combineA_aln.sam > A_aln_reads.file

wc -l A_aln_reads.file

to pull out A

awk '{if($3 == "B.fasta") { print $0 }}' combineB_aln.sam > B_aln_reads.file

wc -l B_aln_reads.file

to pull out B

but its not working. Can I know what is wrong. I am really new to comand lines in Linux

ADD REPLY
0
Entering edit mode

can you add what is not working or why you think it's not working?

ADD REPLY
1
Entering edit mode
4 months ago
colindaven ★ 2.8k

A more standard way to do this would be

  1. convert SAM to BAM with BAI index
  2. run samtools idxstats x.bam
  3. alternative: run samtools stats x.bam

This would give you the stats including number of reads aligned to each reference as a text file. Very useful if you have many references.

ADD COMMENT

Login before adding your answer.

Traffic: 1644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6