Question

Sequences information from BAM file

0

Entering edit mode

7.4 years ago

Lolla ▴ 10

Hello, I'm working with paired end rna-seq data so two fastq files produced 1 BAM file. I know col 10 represent the seq and col 11 have its ascii code for quality. If the original seq length 100bp when I extracted some seq from BAM file I noticed some reads of length LESS than the original length. For example in one location one seq has cigar (40M, which means 40 bps matched) my question is: what happened to the rest of the seq (60bps)?? Another point, in this case we have paired end data which means two seq one from each end but BAM file has only one. This one included in BAM file to which end belongs??

Many thanks..

RNA-Seq next-gen alignment • 2.9k views

ADD COMMENT • link updated 7.1 years ago by siqizhao2018 • 0 • written 7.4 years ago by Lolla ▴ 10

0

Entering edit mode

Did you map locally or end-to-end? Which aligner did you use?

ADD REPLY • link 7.4 years ago by Matteo Schiavinato ★ 3.7k

0

Entering edit mode

end-to-end using STAR

ADD REPLY • link 7.4 years ago by Lolla ▴ 10

0

Entering edit mode

Are you 100% sure? The CIGAR you reported is compatible with a local mapping. What is your command, can you post it here?

For what concerns the absence of the paired mate from the BAM: if you used an option to leave out unmapped scores, perhaps that's the reason.

To know which end it does correspond to, you have to see the bitwise flag. Reads on the reverse strand will have the 16 bit.

What is the bitwise flag for the paired read that doesn't have the mate in the file? Did you check it on https://broadinstitute.github.io/picard/explain-flags.html ?

ADD REPLY • link 7.4 years ago by Matteo Schiavinato ★ 3.7k

1

Entering edit mode

Thanks Macspider; I confirmed trimmed happened prior the alignment, so this is solved now..

ADD REPLY • link 7.4 years ago by Lolla ▴ 10

0

Entering edit mode

Then you can mark the thread as closed, so it doesn't show up in the "open threads" section. ;) Good luck!

ADD REPLY • link 7.4 years ago by Matteo Schiavinato ★ 3.7k

0

Entering edit mode

Thank you, I'm still confused about some fields in BAM file..

ADD REPLY • link 7.4 years ago by Lolla ▴ 10

0

Entering edit mode

Which ones? We can maybe help.

ADD REPLY • link 7.4 years ago by Matteo Schiavinato ★ 3.7k

0

Entering edit mode

Have you checked page 5 of SAM Spec document?

ADD REPLY • link 7.4 years ago by GenoMax 152k

0

Entering edit mode

I have posted my question here Count the fragments from BAM file Thanks in advance..

ADD REPLY • link 7.4 years ago by Lolla ▴ 10

0

Entering edit mode

7.4 years ago

swbarnes2 15k

There are basically two reasons that your reads would be trimmed like that...for quality reasons, or because the read read all the way through to adapter. If it's adapter, you definitely want that trimmed away. If it's for quality reasons..you probably want it trimmed away. You'll have to ask whoever did the alignment if they did a quality trim before aligning.

ADD COMMENT • link 7.4 years ago by swbarnes2 15k

0

Entering edit mode

7.4 years ago

GenoMax 152k

Another point, in this case we have paired end data which means two seq one from each end but BAM file has only one. This one included in BAM file to which end belongs??

Some aligners trim read headers after first whitespace when writing them to alignments, which removes Illumina identifiers signifying read number. Sounds like that applies in your case. So there is no easy way to figure out what end of the fragment the read came from.

ADD COMMENT • link 7.4 years ago by GenoMax 152k

0

Entering edit mode

7.1 years ago

siqizhao2018 • 0

hi,I want to split paired end read as length distribution like (0-50,50-150, >150),How can I do it? when I using samtools extract reads(samtools view -h Epi2_input_rmdup.bam | awk 'length($10) > 100 && length($10) < 200|| $1 ~ /^@/' | samtools view -bS - > 100_200qc_rm/50_Epi2_input_rmdup.bam),I found all reads length no more than 100bp, but when I draw paired reads fragment size stastics using deeptools bamPEFragmentSize,it show length 0-500bp.

ADD COMMENT • link 7.1 years ago by siqizhao2018 • 0

0

Entering edit mode

it's a new question: https://www.biostars.org/p/new/post/

ADD REPLY • link 7.1 years ago by Pierre Lindenbaum 166k

score 0 · Accepted Answer · 2018-01-31

0

Entering edit mode

7.4 years ago

Pierre Lindenbaum 166k

it could be some HARD clipping in your bam ... Difference between Hard Clip and Soft Clip in Samtools CIGAR string or you reads could have been trimmed by a tool like cutadapt

ADD COMMENT • link 7.4 years ago by Pierre Lindenbaum 166k