Question: Regarding Split reads and discordant reads
0
gravatar for DL
11 months ago by
DL10
India
DL10 wrote:

Hello Everyone,

Can any one tell me that what is difference between split reads and discordant reads. From my alignment file, i separated split alignments and discordant alignment but i can not figure out what the exactly result are showing. i am showing one alignment from each file :

DISCORDANT alignments:

H1:1:H5GCTBCXY:1:1107:1197:2139 97 Chr00 27202666 0 7S139M5S Chr03 36179032 0 ATGTCGGTTTTACAAAATCTGTCGTCTGATATGGACTTATGAATTAG TTCAGATGGCGCTTAAGGACAAAACCGTTGTGTGATGCATAAAAAAATGAGCGGCAAGTTTCCCACCATGTAAGTGGTGCCAAACCCCATCTCAAACCCCAAAT DDDDDHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIHIIHHIIIFHHIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIHIHFHIIGIHIIIIIH NM:i:6 MD:Z:20A25A4T23C2T7T52 AS:i:109 XS:i:108 H1:1:H5GCTBCXY:1:1107:1197:2139 145 Chr03 36179032 17 12S28M4D111M Chr00 27202666 0 TTGAAACCTTAGTCCTCTCTTCTCTCTCCGACAACCAAACTTCTGCC TCATAGGATTGATGGTGAATTCATCCCATCACTCTCGCTTCAAAACAGACCAGAGAGAGCCCTAGATTTGAAACCCTCTCTTCTCCAATTCTTCGAGTCCTCTT HDC0HHHIHH?IIIIGIIHHHIHHDHCIHGGHHFHHHFFHEHHHHHHCF<hihhi iiiihiiiiiiiiiiiihheeihhhihiiiiiihgeiiiiiiiiiiigiiiiiiihiihiiiihggc="HIIIIIIIHIHHCHIHIHIIIIIDDDDD" nm:i:11="" md:z:8c5c13^cagt43a13a0g0c15g35="" as:i:94="" xs:i:84="" xa:z:ch="" r05,+46698740,111m4d28m12s,13;chr06,-30674967,12s28m4d111m,13;chr07,+28088165,42m1i68m4d28m12s,13;chr04,-19106513,15s25m4d111m,13;chr07,+49817886,42m1i68m4d28m12s,14;<="" p="">

SPLIT:

B1:1:H5GCTBCXY:1:1107:1334:2188 161 Chr05 26225099 8 82M69S Chr03 1950005 0 AAGAAACTTGATCGAAATAACGAAGACATGGCATTGGAAGATGACAAGGCTTGAAGATGACGC AACAAGGCCCAAGAACTCTTCTGGATTAATTTGGATTTCGGCAAATGAGAGAAAGCCCATTGGATTTTGGGTTGTTGGACGTAATGGA B@@DBHIIIIIHHHIHHIIIIIIIIIIIIIGHIIIIIIIIIIIIIIIIHHHHHIIIIHEHIEHIICHIIII HHHHEHHE?C1<1DGEIHHFCHIHIHEHIIDEHHGIGHHHIGGFHIII@EEEHGEHHH?HHIHI0ECHHEHFHHHHIGHC NM:i:2 MD:Z:11C8T61 AS:i:72 XS:i:67 SA:Z:Chr06,27321942,+,67S11M1I66M6S,0,2 ; XA:Z:Chr06,+25451428,82M69S,3;Chr04,-24335170,69S82M,3;Chr03,+10998176,81M1I19M50S,7;Chr05,+51873112,82M69S,4;Chr03,+11018865,81M1I19M50S,8; B1:1:H5GCTBCXY:1:1107:1334:2188 2209 Chr06 27321942 0 67H11M1I66M6H Chr03 1950005 0 AGGCCCAAGAACTCTTCTGGATTAATTTGGATTTCGGCAAATGAGAGAAAGCCCA TTGGATTTTGGGTTGTTGGACGT IIIIHHHHEHHE?C1<1DGEIHHFCHIHIHEHIIDEHHGIGHHHIGGFHIII@EEEHGEHHH?HHIHI0ECHHEHFHH NM:i:2 MD:Z:48G28 AS:i:65 XS:i:65 SA:Z:Chr05,26225099,+,8 2M69S,8,2; XA:Z:Chr00,+26655585,67S11M1I66M6S,2;Chr07,+37957453,67S11M1I66M6S,3;

Please anyone know about this please explain me.

Thanks & Regards

ADD COMMENTlink modified 11 months ago by Samarth Kulshrestha160 • written 11 months ago by DL10
7
gravatar for Samarth Kulshrestha
11 months ago by
India
Samarth Kulshrestha160 wrote:

For Paired-end data generated by high-throughput sequencers, there are two different types of read categories:

1) Concordant reads (Properly aligned reads)
2) Discordant reads (improperly aligned reads: important to identify genome alteration events)

Concordant reads: have span size within the range of expected fragment size and consistent orientation of read pairs with respect to reference.

Discordant reads: have unexpected span size/inconsistent orientation of read pairs.

Example:

Concordant Reads** - R1----->Expected Mapping distance and orientation<-----R2

Discordant reads: Discordant reads have different categories

A) Based on mapping Distance

  • R1-----> Unexpected mapping distance <-----R2

B) Based on read orientation (Expected read orientation for Paired-end data should be R1 (Forward) R2 (Reverse): FR orientation, but in case of discordant reads, orientations are either FF or RR)

  • R1-----> R2-----> [FF orientation]

  • R1<----- R2<---- [RR orientation]

Split reads: When one portion of an NGS read map to one location and other portion of the same read map to a different location of a genome. When the read both the portions is of equal length, this is called a balanced split.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Samarth Kulshrestha160

Thank you Samarth for explaining. I have some more confusion hope you can explain very well. in my mapping data more than 56 percent mapped reads were contributed discordant reads as well as split reads and i want to identify CNVs in genome. When i come to deletions the i am little bit confused because i have read that deletions are characterized by an insert size longer than expected insert size. insert size of my libaray dataset is 1.5 kb so i extracted all the reads which have greater than this insert size. but now i am confused what about those reads where one mate is mapped on one chr and another is other chromosome.

If you saw the output of discordant reads as i mention above. i do not understand that this type of reads will contribute to deletion or other types of CNVs???

please tell me if you know something about that.

Thanks in advance

ADD REPLYlink written 11 months ago by DL10

insert size of my libaray dataset is 1.5 kb

What kind of library prep and sequencing platform were used for the experiment? 1.5kb is pretty big. Also, if you want to detect structural variants, such as deletions, I highly recommend using established tools, such as VarScan2, Freebayes, VarDict... for SNVs or Lumpy, Pindel, Breakdancer... for larger variants. Do not start with any homebrew solutions like extracting reads of a certain size, followed by custom filtering.

ADD REPLYlink modified 11 months ago • written 11 months ago by ATpoint7.5k

Thank you for your useful suggestion and now i am going to use these software to identify CNVs but before that i just want to clear my concept that which type of reads contribute to which type of event but still i dnt get clear vision but still try.

ADD REPLYlink written 11 months ago by DL10

Deepika, Output mentioned by you seems to have multimapped property (read mapping to multiple locations of the genome: You can check multimapped property by looking at XA tag).

In your first DISCORDANT read: check XA tag (pasted below)

xa:z:ch="" r05,+46698740,111m4d28m12s,13;chr06,-30674967,12s28m4d111m,13;chr07,+28088165,42m1i68m4d28m12s

And in this read : H1:1:H5GCTBCXY:1:1107:1197:2139 97 Chr00 27202666 0 7S139M5S Chr03 36179032 0

one mate is mapping to Chr00 and other maps to Chr03, so this scenario indicates Chromosomal Translocation events, not a Deletion event. MAPQ of this alignment is 0, you should avoid such reads for your CNV/SV study.

For your understanding purpose, you can manually check few of Discordant reads but if you are dealing with genome level study then you should opt for already established SV callers as suggested by @ATPoint

Hope this helps

Thanks

ADD REPLYlink modified 11 months ago • written 11 months ago by Samarth Kulshrestha160

Thank you Samarth, actually i want to clear my concept but there are lots of confusion but now i am going to use software hope so i will get some vision on that.

Thanks

ADD REPLYlink written 11 months ago by DL10

Deepika, You just follow any structural variation (SV) detection method paper, that will give you a clear idea about how to look for SV signatures using Discordant reads. Choose your SV caller carefully.

Best.

ADD REPLYlink written 11 months ago by Samarth Kulshrestha160

Thank you so much Samarth. Can you tell me that what is break point when you search CNVs. i read the paper but its not clear me. if you do not mind and hope its my last question for you and then i am not going to bothering so much.

Thanks

ADD REPLYlink written 11 months ago by DL10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 720 users visited in the last hour