Hello everyone,
I am a second year graduate student and I am very new to this filed and I am posting my detailed query below.
So, what we are trying to do is to take only small RNA from E. coli and sequence them, Our focus is only on tRNAs. There are two sets of samples, one treated with a chemical reagent and the other is not. The idea is that the reagent reacts with only a specific tRNA modification of Ecoli tRNA and that is consequently reflected in the sequencing.
So, the untreated sample shows all the 86 tRNA sequences of E. coli in our NGS data without any mismatches (Please note that all the tRNA modification information is lost).
And the treated sample shows mismatch at the modification position ( i,e the reagent- tRNA modication adduct is behaving like a different base that subsequently is seen as a mismtach )
I have aligned my data to the whole genome using STAR. Later, I have used bcftools mpileup in conjunction with the bcftools call to generate the SNP profiles in untreated and treated samples.
Firstly, is this acceptable ? Also, the tRNA modification I am interested is present in atleast 50 tRNAs as observed by visualization in the IGV (i,e I am expecting atleast 50 snp variants of 86 genes I am interested in)
when I did the SNP analysis with bcftools call, I could only see 30 odd genes. I have tried changing depth but the result didn't change much. my command looks something like this:
bcftools call -o 3Aligned_bcftools_d100000.tRNAs_all.vcf -Ov -v -c -R ../../Genomefiles/Ecoli_tRNAs_get_features.bed 3Aligned_bcftools_d100000.all.vcf.gz
-v is for variants only
-R bed file with tRNAs
-c --consensus-caller
I used the default value for -p, --pval-threshold <float> 0.5
Can anyone please help me regarding this ? is there a simple program that does this because I am not really bothered about ploidy or any such things. I am merely interested in finding how many of those 86 genes display mismatches in the treated sample vs normal.
Any help is greatly appreciated. I have read about gatk but I thought it's a bit sophisticated method for such an analysis.
Thank you !
Regarding the alignment to the genome, you should take into consideration that some of the tRNAs will be very similar to each other and may cross-map to each other. You should consider using different parameters than the default ones to force a non-redundant mapping.
Hey, thank you for your comment. I just realized that the whole post is just my misunderstanding.
I understand what you are saying. Possibly because of the multi mapping, by visualizing alignments in IGV, I misunderstood that there is expression of similar tRNA genes. Therefore I was expecting more number of mismatches.
I did the analysis using -p, --pval-threshold <float> 0.99 It gave me SNPs with qualities ranging from 0.01 to 150. However, I ignored the duplicates. Do you have any idea on how to interpret these qualities ?
Thank you