Question: why reads align to other coordinates?
0
gravatar for star
8 months ago by
star190
Netherlands
star190 wrote:

I have downloaded a specified region as Reference Genome from UCSC (because only this region is important for me) and aligned my reads against it using Bowtie2.

But I do not know why in my .bam file there are other regions with start position of 150864, 539750 and ... .

The Ref Genome (specified coordinate) that downloaded from UCSC.

hg19_dna range=chrX:146949598-147161084 5'pad=0 3'pad=0 strand=+ repeatMasking=none

2 first line of bam file (output of bowtie2)

D00823:89:HMHHMBCX2:1:2111:4641:60710   99  hg19_dna    150864  40  48M =   150946  130 CCTGCCTCAGCCTCCTGAGTAGCTGGGACTATAGGTGCCCGCCACCAT    DDHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    AS:i:-12    XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:31C15C0    YS:i:-5 YT:Z:CP
D00823:89:HMHHMBCX2:2:1116:19382:32083  99  hg19_dna    539750  42  48M =   540015  313 CCTATGGTAGCTCTGGCCAGCGGAACTTCCTGCATCCATGGAGCAATG    DDIIHIHIIHIIIIFEHIIIIIIIHIIIIIIHHHIIIHIIIIGIIII?    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:48 YS:i:0  YT:Z:CP
ADD COMMENTlink written 8 months ago by star190
2

You've downloaded the chrX reference from position 146949598 to 147161084. This is your reference sequence. Bowtie2 does not know that you cropped the reference sequence, the reference start at 1 for the software. If you want the position on the chrX you have to take the full chrX sequence or to add 146949598 to every hit

ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé4.4k

Thanks @ Bastien for your reply.

you mean the first position in my data is 146949598 + 150864 = 147100462

ADD REPLYlink written 8 months ago by star190
2
D00823:89:HMHHMBCX2:1:2111:4641:60710   99  hg19_dna    147100462  40  48M =   147100544  130 CCTGCCTCAGCCTCCTGAGTAGCTGGGACTATAGGTGCCCGCCACCAT    DDHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    AS:i:-12    XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:31C15C0    YS:i:-5 YT:Z:CP
D00823:89:HMHHMBCX2:2:1116:19382:32083  99  hg19_dna    147489348  42  48M =   147489613  313 CCTATGGTAGCTCTGGCCAGCGGAACTTCCTGCATCCATGGAGCAATG    DDIIHIHIIHIIIIFEHIIIIIIIHIIIIIIHHHIIIHIIIIGIIII?    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:48 YS:i:0  YT:Z:CP
ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé4.4k

but why my last start position when I add 146949598 to it is bigger than my Ref Genome range?

my last start position is 2390138 + 146949598 = 149339736

D00823:89:HMHHMBCX2:1:1201:1895:19391   147 hg19_dna    2390138 42  48M =   2389844 -342    CTGTTTTGTCGACACTGAAAACCTGTGTATGAAGTCACCGTCATCAAA    HIIIIHIIIIIIGIHIIIGIIIIIGIIIIIIIIIIIHIIIHIIIIIDD    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:48 YS:i:0  YT:Z:CP
ADD REPLYlink written 8 months ago by star190

Could you please give me the length of your reference sequence ?

ADD REPLYlink written 8 months ago by Bastien Hervé4.4k

it is 4231 lines. (if I did right: cat ref.fa | wc -l)

ADD REPLYlink modified 8 months ago • written 8 months ago by star190
1

I was expecting the number of nucleotide in your reference sequence... Anyway, let's assume you have a fasta format with 60 character per line, minus the header 4230 * 60 = 253 800 bases

You've downloaded a sequence that is 211 486 bases long

How did you end up with a hit around 2 000 000 ??

Could you please try to download the full chrX and restart your alignment, this will avoid the misunderstanding

ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé4.4k

Thanks. I have done it and the head and tail of mt output are as below:

Head:

D00823:89:HMHHMBCX2:1:2114:9942:3900    147 X   2479520 42  48M =   2479476 -92 TGACAGGTGTGAGCCACCATGACTGGCCTGCAATGACACATTTGTATT    IIIIIIIIIIHIIIIIIIIIIIIIIIIIFFGIIIIIHIIGIIIIIIDD    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:48 YS:i:-6 YT:Z:CP
D00823:89:HMHHMBCX2:2:2111:10601:89000  163 X   3042754 42  48M =   3042949 243 TTTCTTTCCTTAACCTTAGGGTCCGTTTTAGTTGCTAAAGGGGCATTT    DDIIIIIIIIIIIIIIIHIIIFH1<FHIHHIIIHIIIIIIIIHHHHHI    AS:i:-3 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:23T24  YS:i:0  YT:Z:CP

tail:

D00823:89:HMHHMBCX2:2:2105:10080:21276  147 X   155118974   42  48M =   155118837   -185    CTGCTCTTAAGTCTGTTGTTAAGTTTCATGTTATAGTGCCTCGTTAGA    IIIIIIIIIIIHHIIIIIIIHIIIIIHIIIIIIIIIHIHHHHIIIIDD    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:48 YS:i:0  YT:Z:CP
D00823:89:HMHHMBCX2:1:2104:1703:56241   99  X   155125479   42  48M =   155125618   187 CCACATAGCAGGAGGTGAGTGGCCTATGAGCATTACCACCTGAGTTCT    CDHHIIIIIGHIIIIHIIIIIGIIGIIIIIIIHHHIGIIHHIGHIIIH    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:48 YS:i:0  YT:Z:CP

Now, would you please let me know how can I crop my specific regions?

ADD REPLYlink modified 8 months ago • written 8 months ago by star190
1
samtools view input.bam "ChrX:3000000-3500000" > output.bam
ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé4.4k

Thanks.

i run it but I got errore :

samtools view uniqu_fmr.bam "ChrX:146949598-147161084" > output.bam
[main_samview] random alignment retrieval only works for indexed BAM or CRAM files.

Then I run it: samtools index uniqu_fmr.bam

but I got this error:

samtools index: "uniqu_fmr.bam" is in a format that cannot be usefully indexed
ADD REPLYlink written 8 months ago by star190
4
gravatar for Bastien Hervé
8 months ago by
Bastien Hervé4.4k
Limoges, CBRS, France
Bastien Hervé4.4k wrote:

If you have a sam file

samtools sort uniqu_fmr.sam > uniqu_fmr.sorted.bam
samtools index uniqu_fmr.sorted.bam
samtools view uniqu_fmr.sorted.bam "ChrX:146949598-147161084" > uniqu_fmr_ChrX_146949598_147161084.bam

If you have a bam file change the first line to :

samtools sort -o uniqu_fmr.sorted.bam uniqu_fmr.bam
ADD COMMENTlink modified 8 months ago • written 8 months ago by Bastien Hervé4.4k

Thanks @ Bastein, it works!

ADD REPLYlink written 8 months ago by star190

Please follow-up on your threads. If an answer was helpful, upvote it, if it solved the issue, mark it as accepted.

enter image description here

ADD REPLYlink written 7 months ago by ATpoint23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 972 users visited in the last hour