Identifying inserts (tranposable elements) using SAM Flags
1
0
Entering edit mode
6.1 years ago
William • 0

Hi,

I am interested in finding the sites of inserts (transposable elements) in a genome of interest. The genome have been sequenced using paired end sequencing, and thereafter an alignment was performed to an index created with the wildtype genome and the insert sequence.

    samtools flagstat alignment.sorted.bam

2415270 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 duplicates

2188174 + 0 mapped (90.60%:-nan%)

2415270 + 0 paired in sequencing

1207635 + 0 read1

1207635 + 0 read2

2187112 + 0 properly paired (90.55%:-nan%)

2187284 + 0 with itself and mate mapped

890 + 0 singletons (0.04%:-nan%)

158 + 0 with mate mapped to a different chr

12 + 0 with mate mapped to a different chr (mapQ>=5)

How should I go about finding the number and sites of inserts in this genome? I think I should focus on the reads with mates mapped to a different chromosome. Is it possible to filter by SAMFLAGS to obtain these read? If so, what are the SAMFLAGS I should use?

Thanks in advance for your help!

alignment sequencing next-gen • 1.8k views
ADD COMMENT
0
Entering edit mode

? I think I should focus on the reads with mates mapped to a different chromosome.

Identifying discordantly mapped reads

ADD REPLY
0
Entering edit mode

Hi,

Thank you for your reply!

I have tried as you suggested the following

samtools view -c -F 1294 alignment.sorted.bam

samtools view -c -F 3854 alignment.sorted.bam

However, in both cases I obtained the number 172, which does not tally with my flagstat output.

158 + 0 with mate mapped to a different chr

12 + 0 with mate mapped to a different chr (mapQ>=5)
ADD REPLY
0
Entering edit mode
6.1 years ago
d-cameron ★ 2.9k

I strongly recommend against doing this yourself unless you know existing tools do not work for your data/organism. Given the repetitive nature of TEs, many of your read alignments are going be wrong anyway.

I would start by using an specialised tools such as RetroSeq. If specialised tools don't work for you I would try a generic SV caller such as GRIDSS or manta and doing post-processing to determine which of the breakends/insertion calls are unplaced TEs.

ADD COMMENT
0
Entering edit mode

Hi,

Thank you for your reply. I am interested to view the "12 + 0 with mate mapped to a different chr (mapQ>=5)" using samtools.

I am thinking of using the command but i am unsure of the SAM flags that I should use. Do you have any idea on this?

samtools view -F or samtools view -f
ADD REPLY

Login before adding your answer.

Traffic: 2711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6