Dealing with next-gen SSR Microsatellite data - help please!
0
0
Entering edit mode
4.3 years ago

Hi All, we have a bioinformatics challenge and we would love any help this community can offer. We have data from a target-enrichment experiment that was supposed to capture certain microsatellite motifs. The three enriched libraries were sequenced in a rapid run on Illumina Hiseq 2500 (paired end mode) and our data is in the standard illumina fastq output. Our three libraries come from three different sources. The first library is developed from fresh fish tissue; the second one is mammal tissue; and the third one is the same mammal species but from fecal samples. For the fecal samples, we need to somehow filter out sequences belonging to the mammal only (i.e. not prey or microbiome). We have a reference genome for the mammal, but not for the fish. The data has been demultiplexed already (so for the fish we have 40 individual fish each with its own .fq file containing all the read data). Now, we are facing the challenge of how to deal with this data. Although we are familiar with most basic bioinformatic tools and analyses we do not have advanced programming skills. We need to find a way not only to find and identify the length of our microsats within the reads but also (for the fecal library) somehow be able to identify unique flanking sequences that would correspond to our mammal, in such a way that the reads of other species in the fecal libraries can be excluded. Would anyone have a suggestion on what approach(es) we could use? We have already (unsuccesfully) attempted to tackle this with SSR_pipeline. Thank you in advance for any help you can offer - it is very much appreciated! Daniel & Vania

next-gen SSR Microsatellites alignment • 576 views
ADD COMMENT

Login before adding your answer.

Traffic: 4031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6