Question: Paired-end flag for singleton
0
gravatar for docdot
4 months ago by
docdot0
Germany
docdot0 wrote:

Dear all,

I'm having some trouble to identify singletons in paired-end sequencing data from Hi-C. I have a Hi-C library originated from 150 bp (75x2) paired-end Illumina flowcell. I ran the HiC-Pro (https://github.com/nservant/HiC-Pro) from the .fastq file and I got the following results:

Total_pairs_processed   3377696 100.0
Unmapped_pairs  227709  6.742
Low_qual_pairs  0       0.0
Unique_paired_alignments        716549  21.214
Multiple_pairs_alignments       686717  20.331
Pairs_with_singleton    1746721 51.713
Low_qual_singleton      0       0.0
Unique_singleton_alignments     0       0.0
Multiple_singleton_alignments   0       0.0
Reported_pairs  716549  21.214

I'm trying to have more information about the 51.713% Pairs_with_singleton. To do this, I'm trying to extract these singleton reads. However, I can't find the proper sam/bam flag to retrieve singletons.

1- Does anyone know the proper sam/bam flag to retrieve singletons?

Apart from that, I decided to map my fastq file with bowtie2 independently of HiC-Pro using the following command:

bowtie2 -N 1 -x ~/Desktop/Genomes_ref/bowtie2/hg19 -1 mysample_S1_L001_R1_001.fastq -2 mysample_S1_L001_R2_001.fastq -S mysample.sam

Then once I tried to retrieve any singleton information, I received different flag numbers for the same read pair:

~/Desktop/test$ samtools view -f 9 mysample.sam | head
M02015:342:000000000-BPD5F:1:1101:9901:1145     89      chr16   150502  42      76M     =       150502  0       CACAGGCTGCAGAGAGTGGGCGCTGTTACCCGTTCACATAAACTTTCTAACCATGCACACAGATCAGAAAACACCC        CGGGEEC<ECFAF@F:GEGEGCGGGGGGGEF9EGGGGE9FDFGGGGGGGGGFGGGGFGDGGFFCFGGGGGECCCCC    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:9279:1148     89      chrUn_gl000225  71224   1       26M     =       71224   0       CAAGAGATGTAACTATTCTCCAGGCT      EECE<ACFGFGGFFE6C-G@ECC?CC AS:i:-5  XS:i:-5 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:2G23       YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10882:1150    77      *       0       0       *       *       0       0       AGTCCTGATCCCCAAATCTGATCCCCAAATCTGATCAGTCAGAGGAAAGTGGGCCACACGGGAAGAGAGGTTCTC CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG     YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10882:1150    141     *       0       0       *       *       0       0       NGACAGAGACAGATCCCATCCCNNNNNNNACTGGCCTTCAAACNNNNNANATTTTAAAGCCTGAAAANNAAGCTAC        #8BCCGGGGGGGGFFFFGFGGG#######::DFGGFFGGGGG?#####:#:9CCFECFGFGGCCFFF##::CFFGG    YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:13285:1152    73      chr9    140284100       42      76M     =       140284100       0       GAGAGGGACAGAGAGGGACAGTGAGACCAGCAAGGAGCTGGGACGCTGGGAGCCAGGTGGATGCATGCAGAGAGGG        CCCCCEGGGGGGGGGGECGGGFGGGGGGGGGFFEGGGGFFGECFCCGEGGGGGGGGG@<DEGG<EGG@CF9<6FFE    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:11747:1152    77      *       0       0       *       *       0       0       AAAAAATTGGGCCAGGCATGGTAGCTCATGCCTATAATCCCAGCACTTTGGGAGGCCAAGAGGGGAGGAACAGATT        CC<CCFGG9<,6CF@@8F@C@FGGG<F<FGGGFGFCF<6,CE,EFC<<FGGG,@@@<E<E<AFCECEF:,C,C,CF    YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:11747:1152    141     *       0       0       *       *       0       0       NTCATCGAATGGACTCGAAAGGNNNNNNNTAATGGACTTGAATNNNNNGNTCCCCAAATCTGATCCCNNAATCTG #-ACCGFAFFF8<C,CD<86@@#######,,:@FFG,CEFGG9#####:#,:CFFGEF<D@9@C@F@##9:CDFC     YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:9920:1152     89      chr9    40639288        1       76M     =       40639288        0       CCTGCCAGCAGATGAGCTTCAAAGTGCCTTAAGGAAGCACTTTGACCAGAAGGTAGATAACTCTTATTATAGAAGA        GEGGGGGGCGGGCGFGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGFGGGGGGGGGGGGGGFGGGGGGGGGCCCCC    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10574:1152    89      chr3    162948628       30      76M     =       162948628       0       GACAAAAACAAGCAATGGGGAAATAATTCCCTATTTAATAAATGGTGTTGGGAAAACTGGCTAGCCATATGCAGAA        <C7GGGGGGGGFE9GGE<C<EDCGCGGGGGGGGGGFFCGGGGGFFFGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC    AS:i:0  XS:i:-5 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:15507:1153    77      *       0       0       *       *       0       0       AATCCCAGCACTTTGGCAGGCCGAGGTGGGCGGATCCCCAAATCTGATCCCCAAATCTGATCCCCAAATCTGATCC        CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG    YT:Z:UP
~/Desktop/test$ samtools view -f 5 mysample.sam | head
M02015:342:000000000-BPD5F:1:1101:9901:1145     133     chr16   150502  0       *       =       150502  0       NTCCAGCTCTGTATTTAGAGTCNNNNNNNGTTGGGGAGATTGGNNNNNANTTGGGGATCAGATTTGGNNATCTTGT        #8ACCFF<FGGEFGGGGCC9FC#######::CFFDGDGGGGGG#####:#696<<@7@FF,,,FEDF##::CDC9E    YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:9279:1148     133     chrUn_gl000225  71224   0       *       =       71224   0       NATCAGTGCATAGATAACTCACNNNNNNNCCTGTAAGCAGAGCNNNNNCNAGAGTTACATAACCCCGNNAATCAGT        #8-B-CFFG@,,;,;FEGGDG8#######,:CC6,,<CF@F@F#####:#:,,99,,CFE,C886BC##99:C<AC    YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:10882:1150    77      *       0       0       *       *       0       0       AGTCCTGATCCCCAAATCTGATCCCCAAATCTGATCAGTCAGAGGAAAGTGGGCCACACGGGAAGAGAGGTTCTC CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG     YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10882:1150    141     *       0       0       *       *       0       0       NGACAGAGACAGATCCCATCCCNNNNNNNACTGGCCTTCAAACNNNNNANATTTTAAAGCCTGAAAANNAAGCTAC        #8BCCGGGGGGGGFFFFGFGGG#######::DFGGFFGGGGG?#####:#:9CCFECFGFGGCCFFF##::CFFGG    YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:13285:1152    133     chr9    140284100       0       *       =       140284100       0       *       *       YT:Z:UP YF:Z:LN
M02015:342:000000000-BPD5F:1:1101:11747:1152    77      *       0       0       *       *       0       0       AAAAAATTGGGCCAGGCATGGTAGCTCATGCCTATAATCCCAGCACTTTGGGAGGCCAAGAGGGGAGGAACAGATT        CC<CCFGG9<,6CF@@8F@C@FGGG<F<FGGGFGFCF<6,CE,EFC<<FGGG,@@@<E<E<AFCECEF:,C,C,CF    YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:11747:1152    141     *       0       0       *       *       0       0       NTCATCGAATGGACTCGAAAGGNNNNNNNTAATGGACTTGAATNNNNNGNTCCCCAAATCTGATCCCNNAATCTG #-ACCGFAFFF8<C,CD<86@@#######,,:@FFG,CEFGG9#####:#,:CFFGEF<D@9@C@F@##9:CDFC     YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:9920:1152     133     chr9    40639288        0       *       =       40639288        0       NACCTG  #8BCCG  YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:10574:1152    133     chr3    162948628       0       *       =       162948628       0       NAAACCTCTAGGATCCCCAAATNNNNNNNCCAAATATGATCCTNNNNNANCCTGACAAAAACAAGCANNGGGGAA #86A@<FGGGF9@AEGGCGGCG#######,:C@FC,C<,CFFG#####:#:,@FFGGFCCFE<F@FG##::7@F:     YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:15507:1153    77      *       0       0       *       *       0       0       AATCCCAGCACTTTGGCAGGCCGAGGTGGGCGGATCCCCAAATCTGATCCCCAAATCTGATCCCCAAATCTGATCC        CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG    YT:Z:UP
~/Desktop/test$ samtools view -f 5 -F 9 mysample.sam | head
~/Desktop/test$

For example, the read M02015:342:000000000-BPD5F:1:1101:9901:1145 presents the flag 89 when I use -f 9 and the same read presents the flag 133 once I use -f 5.

2- Does anyone knows why the flag changes?

Thank you in advance for your time, Raphael

ADD COMMENTlink modified 4 months ago by prasundutta87330 • written 4 months ago by docdot0

The flag does not change. Each mate has its own flag. 89 means (1=paired | 8=mate unmapped | 16=read reverse strand | 64=first in pair) and 133 means (1=paired | 4=unmapped | 128=second in pair).

So, if you want to look for singletons that are aligned, use flag 8, if you want the non-aligned mate, use flag 4 as @prasundutta87 suggests.

ADD REPLYlink written 4 months ago by cschu1811.5k
2
gravatar for prasundutta87
4 months ago by
prasundutta87330
prasundutta87330 wrote:

I believe when a read is singleton, it is 'unpaired' for a paired-end sequencing read. So, you can check for reads whose flags are not set to 1 which is the flag for 'paired-end' reads..

ADD COMMENTlink written 4 months ago by prasundutta87330
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 848 users visited in the last hour