analysis abi solid data
1
0
Entering edit mode
4 months ago
Maryam • 0

Hi all

The data I used for analysis is for the solid platform and was paired sequentially. The problem is with the length of the sequence of readings in the forward and reverse. My procedure was to align the csfastq file with the bowtie and the genome reference (for color space). But I aligned the forward and reverse files separately. Is this method correct?

The text of the article related to this data ....

""The samples were sequenced using the 50625 paired-end protocol, generating 75 nt+35 nt (Paired-End)+5 nt (Barcode) sequences. Quality data were measured using software SETS parameters (SOLiD Experimental Tracking System). For both reads, forwards and reverse, the seed was the first 25 nucleotides with a maximum of 2 mismatches.""

1- Surely I have to align the first 25 nucleotides similar to the article?

2.5 barcode nucleotides When I executed fastqc, the analysis result was not observed. Do I still need to delete the first 5 nucleotides of each reading?

My workflow is as follows

csfasta/ .qual files -------> alignment by bowtie (colorspase index/hg19) (f/r files seprately) ----------> sam to fastq files by 123fastq software ------> fastQC -------> trimmomatic ----------> alignment by hisat2( grch38// F,R files together ) ----------->htseq count

is it true?

alignment bowtie solid • 1.1k views
1
Entering edit mode

Save yourself the trouble and find alternate dataset if you can. Colorspace data is going to be a hassle to deal with. Most current programs have stopped supporting it.

0
Entering edit mode

I have to do this analysis because there is no other for the disease under study. I'm so confused and do not know what to do. Please guide me.

0
Entering edit mode

Please show all commands you used...and be aware that if these are correct and data still get poorly aligned you are not doing yourself a facor with starting a project based on very suboptimal (=crap) data.

0
Entering edit mode

I read in an article that the alignment rate of solid data is low (about 40%). Do you think this is true?

SRR1175538_F3.csfasta  , SRR1175538_F3_QV.qual =  **forward**
SRR1175538_F5-RNA.csfasta ,SRR1175538_F5-RNA_QV.qual =  **revers**


then align by bowtie whit colorspace index- hg19

bowtie -p 2 -S -t --Q1 ./SRR1175538_F3_QV.qual  --Q2 ./SRR1175538_F5-RNA_QV.qual -C -f ./bowtie-index/hg19_ColorSpace_Bowtie/hg19_c  -1 ./SRR1175538_F3.csfasta  -2  ./SRR1175538_F5-RNA.csfasta ./SRR1175538.sam


the result from the above command was a sam file whit 0 kb and 0% alignment rate.

so then align forward and revers files separately by

bowtie -p 2 -S -t -Q ./SRR1175538_F3_QV.qual -C -f ./bowtie-index/hg19_ColorSpace_Bowtie/hg19_c ./SRR1175538_F3.csfasta ./SRR1175538_f5.sam


then by 123fastq software convert sam files separately to fastq file

and quality control by fastQC

and trim fastq files together by trimmomatic by

java -jar ./Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 3 SRR1175538_1.fastq SRR1175538_2.fastq SRR1175538_1p.fastq SRR1175538_1u.fastq SRR1175538_2p.fastq SRR1175538_2u.fastq SLIDINGWINDOW:20:25


I did not continue the analysis because I was not sure of the correctness of the steps and the results obtained.

0
Entering edit mode
4 months ago
ATpoint 61k

If you want to properly make use of the paired-end data then no, you will need to align them in paired-end mode. See the bowtie manual for details. Since the R1 and R2 are not indepedent a separate alignment makes little sense.

0
Entering edit mode

When I alignment F and R files together, the alignment rates was zero. But when I did it separately, the alignment rates were 42% and 61%.

0
Entering edit mode

The R2 reads are terrible - as you can see by the length. If you insist on using R2 reads, set the alignment parameters to be much more flexible - eg much smaller seed size.

I would probably just go with R1 - how much signal is really present in R2 ? Also, the best aligner by far is NovoalignCS, which should be available for 1 month for free. I'd go with this. I think bowtie1, which can't align indels, is the only bowtie which supports colorspace. Avoid bowtie1.

An older tophat version might support color space as well, if still available (may use bowtie1 though).

0
Entering edit mode

Hi colindaven

Thank you very much for your help

According to the article, do you think I should do the alignment for the forward string only on the first 25 nucleotides?

Best

0
Entering edit mode

No, use the full 75bp of the R1 read. Ignore the R2. When you get to BAM stage you can check the quality of both using FASTQC. It will be terrible for the R2 - I've had plenty of experience with that, at most you can use about 15bp of the R2.

The 25bp is just the seed - read up on algorithms and seeds, extensions and scores if interested. But use the full 75bp of R1 to get an accurate alignment.

Last note - don't try to do de novo assembly, it will not work with SOLiD data. One of the many reasons it is not on the market any more.

0
Entering edit mode

Thanks a lot, conlindaven

The genome reference available to me for elemental data is 19 hg.

Do you think I should use hg 19 or grch 38 to continue the analysis and alignment of the fastq file (according to the above-mentioned procedure)?

0
Entering edit mode
0
Entering edit mode

My question is, can different reference genomes be used for alignments (2 steps)?

0
Entering edit mode

I continued the analysis in R but I realized that all the genes are zero, where is my problem?

0
Entering edit mode

Out of content questions are impossible to answer.

0
Entering edit mode

check reads are aligned in the genes using IGV. Then make sure the chromosome names in the annotation file / database in R are the same as the chromosome names you are using. Eg is it 1 or chr1 ?