Question: Get positions in BAM with reads supporting at least 2 different nucleotides
0
gravatar for Roman Hillje
13 months ago by
Roman Hillje40
Milan, Italy
Roman Hillje40 wrote:

Hi guys,

I'm currently looking for a tool or script that takes a BAM file (from an RNAseq experiment) and outputs a list of all the positions that have reads supporting at least 2 different nucleotids (WT vs variant or whatever you want to call it). Ideally, it also reports the coverage so I can filter it later. Does anybody have an idea how to do this? I don't really care about the base call quality at the positions at this point.

Cheers, Roman

rna-seq • 402 views
ADD COMMENTlink modified 13 months ago by Pierre Lindenbaum124k • written 13 months ago by Roman Hillje40

Could you give an example of the desired output?

ADD REPLYlink written 13 months ago by ATpoint26k

So you want every location where a single read differs from the consensus? The vast majority of the loci you identify will not be real differences, but will be random errors in the reads.

ADD REPLYlink written 13 months ago by swbarnes27.0k
0
gravatar for Pierre Lindenbaum
13 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

using samfixcigar to convert the cigar string to identify the mismatches. http://lindenb.github.io/jvarkit/SamFixCigar.html

and the samjdk to filter the reads containing a least two cigar elements 'X' or 'I' or 'D'. http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samfixcigar.jar -R src/test/resources/rotavirus_rf.fa src/test/resources/S1.bam |\
java -jar dist/samjdk.jar -e 'return !record.getReadUnmappedFlag() && record.getCigar().getCigarElements().stream().map(C->C.getOperator().name()).filter(OP->OP.equals("X") || OP.equals("I") || OP.equals("D")).count()>1;'
@HD VN:1.5  SO:coordinate
@SQ SN:RF01 LN:3302
@SQ SN:RF02 LN:2687
@SQ SN:RF03 LN:2592
@SQ SN:RF04 LN:2362
@SQ SN:RF05 LN:1579
@SQ SN:RF06 LN:1356
@SQ SN:RF07 LN:1074
@SQ SN:RF08 LN:1059
@SQ SN:RF09 LN:1062
@SQ SN:RF10 LN:751
@SQ SN:RF11 LN:666
@RG ID:S1   SM:S1   LB:L1   CN:Nantes
@CO Processed with samfixcigar compilation: 2018-09-28:14-09-50 githash: 1919e185cb0c676aa912bd7c2b4cf9d1113f3927 htsjdk: 2.15.0
RF01_27_590_3:0:0_1:0:0_68  163 RF01    27  60  4=1X14=1X12=1X37=   =   521 564 GTATCATCTAATCTTGTCATAATATTTATCATATATATATAACTCACAATCCGCAGTTCAAATTCCAATA  2222222222222222222222222222222222222222222222222222222222222222222222  RG:Z:S1 NM:i:3  AS:i:55 XS:i:0
RF01_110_504_2:0:0_1:0:0_5d 163 RF01    110 60  13=1X19=1X36=   =   435 395 ATAGTGAATTAGATAATAGATGTATTGAATTTCCTTCTAAATGCTTAGAAAACTCAAAGAATGGACTATC  2222222222222222222222222222222222222222222222222222222222222222222222RG:Z:S1   NM:i:2  AS:i:60 XS:i:0
RF01_158_624_3:0:0_1:0:0_61 99  RF01    158 60  33=1X21=1X12=1X1=   =   555 467 AAAACTCAAAGAATGGACTATCATTGAAAAAGCACTTTGTTGAATATAGCGATGTAATGGAGAATGCCCC  2222222222222222222222222222222222222222222222222222222222222222222222  RG:Z:S1 NM:i:3  AS:i:58 XS:i:0
RF01_205_734_3:0:0_2:0:0_79 99  RF01    205 60  14=1X27=1X22=1X4=   =   665 530 AGCGATGTTATGGATAATGCCACACTGTTGTCAATATTATCGAACTCTTATGATAAATATAACGCAGTTG  2222222222222222222222222222222222222222222222222222222222222222222222  RG:Z:S1 NM:i:3  AS:i:55 XS:i:0
RF01_295_749_2:0:0_4:0:0_9a 163 RF01    295 60  40=1X20=1X8=    =   680 455 GCAAAAGGTAAGCCGCTAGAAGCAGATTTGACAGTGAATGCGTTGGATTATGAAAATAACACGATAACAT  2222222222222222222222222222222222222222222222222222222222222222222222RG:Z:S1   NM:i:2  AS:i:60 XS:i:0
(...)

. Ideally, it also reports the coverage so I can filter it later.

use any tool giving the coverage.... like GATK DepthOfCoverage

ADD COMMENTlink modified 13 months ago • written 13 months ago by Pierre Lindenbaum124k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1249 users visited in the last hour