how to extract allele specific reads from bamfiles?
1
4
Entering edit mode
5.9 years ago
904612498 ▴ 40

Hi everybody, how to extract allele specific reads from bamfiles? Thanks in advance.

SNP • 3.5k views
ADD COMMENT
3
Entering edit mode

it sounds like a xy problem http://xyproblem.info/ what do you want to do at the end ?

ADD REPLY
0
Entering edit mode

I want to distinguish containing SNP reads from female parent or male parent.

ADD REPLY
0
Entering edit mode

Please use ADD REPLY to answer to earlier comments, as such this thread remains logically structured and easy to follow.

ADD REPLY
0
Entering edit mode

for example,in plants hybirds´╝îI want to distinguish reads from female parent or male parent with SNP

ADD REPLY
0
Entering edit mode

what would be your input ? how do you define a deletion / insertion ? what happens if one read contains a variant carrying two mutations from both parents ?

ADD REPLY
0
Entering edit mode

There must be a more elegant way but I have this piece of code lying around for the mitochondrial genome. I used it to split reads from a BAM file into those supporting a Neanderthal specific base versus present-day human contaminant one:

https://github.com/grenaud/schmutzi/blob/master/splitEndoVsCont/poshap2splitbam.cpp

If no one answers and you can hack it, feel free to use it. The input .pos format is in the same directory.

ADD REPLY
0
Entering edit mode

I have known the SNP information about male and female,for example,12 reads containing male SNP and 8 from female of 20 reads in hybirds according to SNP.

ADD REPLY
0
Entering edit mode

Essentially you want the allelic ratio/count per SNP? How do you know whether the allele is from male or female?

Say a position is a G/T SNP, how do you determine G is from male and T from female or vice versa?

ADD REPLY
2
Entering edit mode
5.9 years ago

that was fun enough to quickly write something : https://github.com/lindenb/jvarkit/wiki/Biostar214299 The program reads your list of mutation and tries to affect the read to a SAM-ReadGroup.

$ cat positions.tsv
rotavirus       267     C       SAMPLE1
rotavirus       267     G       SAMPLE2


$ java -jar dist/biostar214299.jar -p positions.tsv input.bam
@HD     VN:1.5  SO:coordinate
@SQ     SN:rotavirus    LN:1074
@RG     ID:UNAFFECTED   SM:UNAFFECTED   LB:UNAFFECTED
@RG     ID:UNMAPPED     SM:UNMAPPED     LB:UNMAPPED
@RG     ID:SAMPLE1      SM:SAMPLE1      LB:SAMPLE1
@RG     ID:SAMPLE2      SM:SAMPLE2      LB:SAMPLE2
@RG     ID:AMBIGOUS     SM:AMBIGOUS     LB:AMBIGOUS
(...)
rotavirus_237_744_6:0:0_3:0:0_29c       163     rotavirus       237     60      70M     =       675     508     ATCCGGCGTTAAATGGAAAGTTTCGGTGATCTATTAGAAATAGAAATTGGATGACTGATTCAAAAACGGT  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++      MD:Z:3A19A1C1C1G31T8    RG:Z:SAMPLE1    NM:i:6  AS:i:41 XS:i:0
rotavirus_234_692_6:0:1_4:0:0_3ac       163     rotavirus       237     60      6S30M5I1M5D28M  =       623     456     TTGGTAATCAGGCGTTAAATGGAAAGTTTAGCTCAGGACAACGAAATAGAAATTGGATGACTGATTCTAA  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++      MD:Z:31^TATTA28 RG:Z:SAMPLE2    NM:i:10 AS:i:37 XS:i:0
rotavirus_237_777_6:0:0_7:0:0_216       99      rotavirus       237     60      70M     =       708     541     ATCAGGGGTTAAATTGAAAGTTTAGCTCAGCTCTTAGACATAGAAATTGGATGACTGATTGTACAACGGT  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++      MD:Z:6C7G17A5A21C2A6    RG:Z:SAMPLE1    NM:i:6  AS:i:40 XS:i:0
rotavirus_237_699_3:0:0_8:0:0_22f       163     rotavirus       237     60      70M     =       650     463     ATGAGGCGTTAAATGGAAAGTTTATCTCAGCTATTAGAAATAGCAATTGGATGACTGATTCTAAAACGGT  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++      MD:Z:2C21G18A26 RG:Z:SAMPLE1    NM:i:3  AS:i:57 XS:i:0
(...)
rotavirus_311_846_10:0:0_11:0:0_3d7     141     *       0       0       *       *       0       0       AACTTAGATGAAGACGATCAAAACCTTAGAATGACTTTATGTTCTAAATGGCTCGACCCAAAGATGAGAG  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++      RG:Z:UNMAPPED   AS:i:0  XS:i:0
rotavirus_85_600_7:0:0_9:0:0_3e0        77      *       0       0       *       *       0       0       AGCTGCAGTTGTTTCTGCTCCTTCAACATTAGAATTACTGGGTATTGAATATGATTCCAATGAAGTCTAT  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++      RG:Z:UNMAPPED   AS:i:0  XS:i:0
rotavirus_85_600_7:0:0_9:0:0_3e0        141     *       0       0       *       *       0       0       TATTTCTCCTTAAGCCTGTGTTTTATTGCATCAAATCTTTTTTCAAACTGCTCATAACGAGATTTCCACT  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++      RG:Z:UNMAPPED   AS:i:0  XS:i:0

you can later 'split' the output using samtools view and the option -r

         -r STR   only include reads in read group STR [null]
ADD COMMENT
0
Entering edit mode

This is really helpful. It would be great if you could opt out of giving the allele and it would just mark non-reference cases.

ADD REPLY

Login before adding your answer.

Traffic: 1279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6