Question: Extract uniquely mapped reads from bam file
0
gravatar for User000
2.6 years ago by
User000270
User000270 wrote:

Hello,

I have a genome with homologous pairs of chromosome. I am using hisat2 to map RNA-seq reads to genome. I want to do variant calling with GATK, but before I want to extract the reads that mapped uniquely from BAM file(not SAM). Could you suggest how to do that? From other threads seems that flag NH:i:1 means mapped once, but how can I grep it from bam file? and is it a right way?

HISEQ1:128:C0HUMACXX:6:2210:15441:67674 163 chr1A   1101543 60  59M =   1104438 626 CAATCTTCTCGTGGTTTATTGGCGAATGGAGTTTCAGAAGATGCTCAAAATGGTTTCCA @BCFFDDEFHDDFGEEEDHEHCBHFD<3D3C*?D:B?:B?B*98D?4?BB=CG)=FFHA AS:i:0  ZS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:59 YS:i:0  YT:Z:CP NH:i:1
HISEQ1:130:C0G55ACXX:1:1203:2254:30173  163 chr1A   1101543 60  101M    =   1101599 155 CAATCTTCTCGTGGTTTATTGGCGAATGGAGTTTCAGAAGATGCTCAAAATGGTTTCCATGATGCTGATCGAACTGTCCGCAGAGGTGAGGGGCCATCTAA   CCCFFFFFHHHHHJGIJJJJJJJJJJJJJGIJIJJIJIJJJJJJJJJJJJJJJIIJJJJJJJJJJJJHHHHHFFFFEEEDDDDDDD@BDDDDDDDDDDDDD   AS:i:0  ZS:i:-15    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:101    YS:i:-2 YT:Z:CP NH:i:1
rna-seq • 3.5k views
ADD COMMENTlink modified 2.6 years ago by Devon Ryan90k • written 2.6 years ago by User000270
5
gravatar for Devon Ryan
2.6 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

samtools view -b -q 5 -o output.bam alignments.bam

Having said that, there's no reason to bother doing this. GATK is smart enough to understand that low MAPQ alignments should be ignored.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Devon Ryan90k

thank you, may I pose the question in another way also:;since i have these homology i want to optimize parameters to map reads uniquely to the different subgenomes..so I either optimize parameters of mapping with hisat2 or filter the bam file...any suggestion is much appreciated.

ADD REPLYlink written 2.6 years ago by User000270

There are many allele-specific alignment and processing packages already made, don't reinvent the wheel.

ADD REPLYlink written 2.6 years ago by Devon Ryan90k

can you suggest me how to grep "NH:i:1" reads from bam? thanks

ADD REPLYlink written 2.6 years ago by User000270
1

That's much slower and will produce the same results. Don't waste your time.

ADD REPLYlink written 2.6 years ago by Devon Ryan90k

another question is: why with the command line above my new bam file is 4 times bigger than the previous one?

ADD REPLYlink written 2.6 years ago by User000270
1

Because I didn't include the -b option, mea culpa.

ADD REPLYlink written 2.6 years ago by Devon Ryan90k

I see, thank you very much! Why did you use -q 5? also, do you think it is ok to filter unique reads after doing mark dups step?

ADD REPLYlink modified 2.5 years ago • written 2.6 years ago by User000270

I don't understand why mapping quality >= 5 means uniquely mapped :(

ADD REPLYlink modified 18 months ago • written 18 months ago by chefarov120

It's an undocumented "feature" of hisat2/topat (and STAR, in case you ever need to use that) that MAPQ values of 0-3 are used for multimappers and a higher value (255 or 50) is used for non-multimappers.

ADD REPLYlink written 18 months ago by Devon Ryan90k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 819 users visited in the last hour