Question: Alignment of the partial genome sequencing
0
gravatar for micro32uvas
21 months ago by
micro32uvas10
Pakistan
micro32uvas10 wrote:

Hello,

I have a pretty basic question. I hope to get help in this matter.

We did this whole genome targeted sequencing earlier. Now we need only a partial chr1 sequence aligned with the reference genome, instead of doing the alignment and mapping all over again.

I took chr1 of the ref genome as a complete reference and alignment the mate pair fastq files. but I am stuck at this point again. Can anyon help.

Its chicken genome.

Aini

sequencing alignment next-gen • 725 views
ADD COMMENTlink modified 21 months ago by Antonio R. Franco4.1k • written 21 months ago by micro32uvas10

Do you want to align only 1 chr1 sequence to the reference genome? Is that your question ?

ADD REPLYlink written 21 months ago by Antonio R. Franco4.1k

yes, infact, I have a specific location to align that is associated with my trait of interest e.g. 1:67,246,039- 67,350,164. And I have raw mate pair fastq files.

ADD REPLYlink modified 21 months ago • written 21 months ago by micro32uvas10

I have created a bam file using bowtie2. Ref file is chr1 of the reference genome. with 2 mate pair fastq files.

here is the same result:

@HD VN:1.0 SO:unsorted @SQ SN:NC_006088.4 LN:196202544 @PG ID:bowtie2 PN:bowtie2 VN:2.1.0 SRR455103.1 83 NC_006088.4 9011 7 100M = 8970 -141 AAAGATACCCAAAAGAACCGACTCACCCCTCCCGGCGGCAGCGCGCAGGCAGAGCCGAGCTGAGCAATGTGTCCCTCCCCGTTCGTAGCGCCGCGCCGCC #######################################################?A>A8?BAAA@8GBG>DD>DD<>GGHHHHHHHHGHHHDGHDDHGH AS:i:-6 XS:i:-6 XN:i:0 XM:i:3 XO:i:0 XG:i:0 NM:i:3 MD:Z:17T17T0A63 YS:i:-10 YT:Z:CP SRR455103.1 163 NC_006088.4 8970 7 100M = 9011 141 GCAATGGCAAAGCCTTTATTTAAGGAATAGGCACACGAGGGAAAGATACCCAAAAGAATCGACTCACCCCTCCCGGCAGCAGCGCGCAGGCAGAGCCGAG IIHA?BBBGGGDGGGGD?GDD;@GGGGGIDIE?IIIIIGDIHIHGI@GIGID>GBGGDDDGGGDGGGDGFIIFHDEDFGE88>=A?A?HBBH>?A##### AS:i:-10 XS:i:-34 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:24A51T23 YS:i:-6 YT:Z:CP

But i cannot extract the coordinates as you suggested: Antonio R. Franco

samtools view accepted_hits.bam chr1:67246039-67350164 > mylectures.sam

I dont understand, what to do at this point

ADD REPLYlink written 21 months ago by micro32uvas10

You are doing right in reding these instructions http://www.htslib.org/doc/samtools.html. samttools view had some requirements, like the presence of the header. If you want, share your bam file with me through dropbox, mega or the like, and I give it a look. My address is arfranco (@) uco.es

ADD REPLYlink written 21 months ago by Antonio R. Franco4.1k
1
gravatar for Antonio R. Franco
21 months ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.1k wrote:

You have several choices

  1. You can map all of your lectures to the whole reference genome, and thus, interrogate with samtools which of your lectures map to this region. samtools view accepted_hits.bam chr1:67246039-67350164 > mylectures.sam This order assumes that your bam mapped file is named accepted_hits.bam, and that the name of the chromosome in the reference genome is chr1. You can also see this in a graphic genome browser such as IGV, IGB or the like

  2. You can also extract this region only and map using it as a reference

ADD COMMENTlink modified 21 months ago • written 21 months ago by Antonio R. Franco4.1k

I got your file..

Your bam file is right. It contains the header. I see you used bowtie2 with samtools view -H SRR733103.bam

with samtools view SRR733103.bam | cut -f3 | uniq -c

I can see that you have 3777516 mapped reads to the reference sequence named NC_006088.4 (is not chr1, you need to use the actual name you used for the reference sequence)

Then you need to index your BAM with samtools index SRR733103.bam

Once you have your BAM file indexed you can run samtools view -h NC_006088.4:67246039-67350164 > selected.sam

In selected.sam you have all the mapped sequences to NC_006088 included in the range of the indicated coordinates, and will include the header into the new sam file. You don't actually need to use a bam file for that as selected.sam file is small

Hope this helps

ADD REPLYlink modified 21 months ago • written 21 months ago by Antonio R. Franco4.1k

It worked perfectly. However, now that i have this partial desired mapped sequence, how can i look for the potential snps?? I can call SNPs but what about annotation. I am not clear about what possible should be done afterwards.

I know its little out of this initial questions' scope, but it is just the next step afterwards. I hope i get an elaborate response as before :)

ADD REPLYlink written 21 months ago by micro32uvas10

You can look for SNP under different approaches

  1. Look for samtools mpileup

  2. Load the sam file along the reference genome in a visual genomic browser like IGV o IGB. The same can be done with samtools tview but in not such as elegant way

  3. Look for the way of getting a VCF (variant file) using samtools

ADD REPLYlink modified 21 months ago • written 21 months ago by Antonio R. Franco4.1k
0
gravatar for Antonio R. Franco
21 months ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.1k wrote:

with samtools view selected.sam | wc -l you count the number of mapped reads to this range,. There are 285 mapped reads

ADD COMMENTlink written 21 months ago by Antonio R. Franco4.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1483 users visited in the last hour