Hello, I am completely new to Sequencing and programming (and I am blonde) - so please bear with me.
I already saw that there are some questions about it, but I could not really understand/deduce what I have to do now. So, I have the task to recreate some figures of a paper with RStudio. I choose the single-cell RNA-Seq results from this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8077299/ I already downloaded all SRA files via the SRAtoolkit and I am already converting them in FastQ files with the split-3 option. And I know I have to check them with FastQC afterwards. But, there are for each sample two SRA runs (this sample for example: https://www.ncbi.nlm.nih.gov/sra/SRX8998846[accn] ) Why are there 2 SRA files, which will result in ultematively in 4 FastQ files for one sample? I have read somewhere "technical duplicates", but there is also this huge difference in size (6.9 GB and 18.3 GB) and if I am clicking on the runs to get more informations I get lost.
Can someone please explain to me why there are 2 SRA files are and how I should proceed with them?
Thank you. Then I will read the publication again. If the second SRR run is only containing cell barcode + UMI, how should I proceed? Roughly speaking. Was it something along the lines of aligning them and trimming the ends?
Yeah actually there are three SRR runs for one sample. Two with the same SAMN and SRX number (where I am/was confused) and the other with different SAMN/SRX numbers. That much, I was able to recognize. But everything beyond is still like magic for me.
I was referring to two SRA runs for the SRX number you had linked above. Since both are using the same number of cycles I am not sure which one is TAPseq (I am not familiar with that technique). You can link SRR number you are referring to if you want us to take a look.
This link is giving a general overview of all samples. https://www.ncbi.nlm.nih.gov/Traces/study/?query_key=1&WebEnv=MCID_60d31b728740610d17105811&o=acc_s%3Aa And we see that each sample is listed three times with completely different sizes in bytes and bases.
The third one of each triplet is containg the TAP-Sequences. This, they have written (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR12508051). But not, what exactly the difference between the first two is (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR12508049 and https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR12508050).
I think you should email the submitters and ask. It is confusing since the top two entries seem to have no distinguishable metadata other than different run numbers.
Seems I have to do this.
I really appreciate your help. Thank you very much :)
Please post their clarification here if you get it from them. Would be interesting to know.