Increase sample length in examples to increase chance that meaningful sequences can be found
Each mRNA sequence is encoded by 3 codons. Of 64 possible sequences, 1 encodes start and 3 encode stop.
The probability for a amino-acid sequence to generate a protein of length n is
P_n = P_0 *\left(\frac{60}{64}\right)^n * P_t \text{ with } P_0 = \frac{1}{64} \text{ and } P_t = \frac{3}{64}
Where P_0
is the probability of the start acid and P_t
is the probability of a terminator acid.
Given these statistics, can we expect to find a valid protein in a sequence of 1000 codons?