StarOceanHouse
Bluelight Crew
Hey guys, I need help with this problem. I wanna see if I did it right. Here it goes.
So I figured the probability of CTAG occuring in any order is 1/4! which is 1/24 then I multiply it by 31.5/100 * 31.5/100 * 68.5/100 * 68.5/100 = 0.001939952 = probabilty of CTAG occurring in that exact order. Multiplying that by the number of nucleotides in the whole sequence gives me 8557.
1. (15 points) The overall composition of the M. tuberculosis H37Rv genome is A = T = 31.5%, C = G = 68.5%. Suppose you have a random sequence containing 4,411,000 nucleotides, but with these proportions of nucleotides. What would be the expected number of times the sequence CTAG would occur in the whole sequence?
Discrepancies between observed and expected tetranucleotide counts highlight features that may have interesting biochemical explanations, such as unusual flexibility or mismatch repair. CTAG is of interest for the latter reason, as its occurrence is rare in prokaryotic genomes, possibly causing kinks under conditions of supercoiling. In this context, it may serve a specific structural purpose, as a binding site for repressor proteins.
This is a probability problem, which can be solved by applying a method known as Whittle’s equation. There are also programs available, such as codonW which perform correspondence analysis of codon usage. For this problem, however, we will attempt to make a valid estimate as follows. Notice that this sequence is palindromic, thus if a nucleotide occurs on one strand, it occurs on the other. So, the cumulative probability of CTAG is the product of these (0.315)
So I figured the probability of CTAG occuring in any order is 1/4! which is 1/24 then I multiply it by 31.5/100 * 31.5/100 * 68.5/100 * 68.5/100 = 0.001939952 = probabilty of CTAG occurring in that exact order. Multiplying that by the number of nucleotides in the whole sequence gives me 8557.