Question: Alignment in IGV: extracting specific reads.
0
gravatar for KVC_bioinfo
12 months ago by
KVC_bioinfo350
Boston
KVC_bioinfo350 wrote:

Hello All,

I have aligned the query sequence with the human genome. I am interested in looking at exons of a specific gene. When I look at the IGV I can see several reads that are mapping to the gene of interest across all the exons. I am confused how can I get the read length of each read which is aligning to exon 1, exon 2 and so on.

I tried to extract the reads giving coordinates of exons using samtools. However, when i converted the resulting bam file to fasta and tried to calculate the read length this gives me the original read length and not the read length of read matching to the region I specified.

I used the following command:

For extracting the reads from original bam file for the region of interest:

samtools view -h in.bam "chr3:region1-region2" > exon.bam

Converting bam to fasta :

samtools bam2fq exon.bam | seqtk seq -A > exon.fasta

for counting the lengths of reads from fasta file:

awk '/^>/ {if (seqlen) print seqlen;print;seqlen=0;next} {seqlen+=length($0)}END{print seqlen}' exon.fasta

Am I doing anything wrong here or missing anything totally? Thank you.

igv samtools • 870 views
ADD COMMENTlink written 12 months ago by KVC_bioinfo350
1

However, when i converted the resulting bam file to fasta and tried to calculate the read length this gives me the original read length and not the read length of read matching to the region I specified.

Some reads are going to extend outside these boundaries. I am not sure what exactly you are trying to do? What does "tried to calculate the read length" mean?

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax59k

When I tried to calculate read length of the reads which I got from ::

samtools view -h in.bam "chr3:region1-region2" > exon.bam (chr3:region1-region2 is one of the exons)

This gave me read lengths of original reads and the reads that align with the region I specified.

I am trying to get the length of reads that align with exon 1, exon 2 and so on.

ADD REPLYlink modified 12 months ago • written 12 months ago by KVC_bioinfo350
1

Still not clear. Individual reads are all going to be more or less the same length (subject to trimmed lengths). Are you trying to see what is the length of soft clipping during alignments?

Note: Are these Nanopore/PacBio reads? Again you may have left that important bit out.

So you want to find the part of read that aligns to Exon 1 and then Exon 2 etc?

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax59k

These are nanopore reads. I am trying to see what is the length of the reads that mapped to region I specified.

ADD REPLYlink written 12 months ago by KVC_bioinfo350
3

If you are only interested in length of the part that is aligned then you could look at the CIGAR's and flags/tags in your alignments.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax59k

Thank you. How can I identify from cigar? The cigar remains the same for the read across all the exons.

ADD REPLYlink written 12 months ago by KVC_bioinfo350
1

If you have the CIGAR fields in front of you, the read length is column 6.

ADD REPLYlink written 12 months ago by Hussain Ather890

Can you post a read that maps to multiple exons from your SAM file?

ADD REPLYlink written 12 months ago by genomax59k
1

Use this tool to visualize the alignment: sam2pairwise: a tool to visualize SAM entries as pairwise alignments

ADD REPLYlink written 12 months ago by genomax59k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1562 users visited in the last hour