Question: filtering alignned seqeunces
0
gravatar for priscilladraj
19 days ago by
priscilladraj10 wrote:

Hi, I have used ONT minimap to align combined reference sequences (fasta file combined together in one file but separated by header " >" for each reference sequence). Then, I got one aligned sam file . My question is I dont know how to filter out my aligned data to each of the reference sequence. Details are below: I have 2 reference sequences in fasta format (lets call it A and B) And one aligned data in sam format. I want to see how many sequences align to each referenec DNA sequence (A and B)

Do I use the awk comand

alignment • 94 views
ADD COMMENTlink modified 19 days ago by colindaven2.6k • written 19 days ago by priscilladraj10

I tried this

awk '{if($3 == "A.fasta") { print $0 }}' combineA_aln.sam > A_aln_reads.file

wc -l A_aln_reads.file

to pull out A

awk '{if($3 == "B.fasta") { print $0 }}' combineB_aln.sam > B_aln_reads.file

wc -l B_aln_reads.file

to pull out B

but its not working. Can I know what is wrong. I am really new to comand lines in Linux

ADD REPLYlink modified 19 days ago by lieven.sterck9.5k • written 19 days ago by priscilladraj10

can you add what is not working or why you think it's not working?

ADD REPLYlink written 19 days ago by lieven.sterck9.5k
1
gravatar for colindaven
19 days ago by
colindaven2.6k
Hannover Medical School
colindaven2.6k wrote:

A more standard way to do this would be

  1. convert SAM to BAM with BAI index
  2. run samtools idxstats x.bam
  3. alternative: run samtools stats x.bam

This would give you the stats including number of reads aligned to each reference as a text file. Very useful if you have many references.

ADD COMMENTlink written 19 days ago by colindaven2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1659 users visited in the last hour
_