|
|||
The Cellular Computer I've written in this space before about Turing's stopping problem, one of the most important theoretical results in computer science. Turing invented a degenerate form of computer which was simultaneously simple enough to be susceptible to mathematical analysis while being powerful enough to be useful. Indeed, it can be shown that a Turing Machine (as it's known) is capable of simulating a P4, or a Cray, or indeed all modern digital computers sold in quantity. Turing proposed an interesting question: Take a Turing Machine and program it so that it has a "stop" state. Will it stop? Actually, what he asked was this: Can a Turing Machine look at the tape contents, state definitions and starting conditions of some other Turing Machine and determine if it will stop? Turing proved that it could not. In some cases the solution required an infinite number of operations. Since the stopping problem defines limits on what computers of that architecture are capable of accomplishing, it is important to understand it. If you can show that solving a given problem would also mean that you were solving the stopping problem, then the given problem can't be solved with computers of that architecture, because we already know that the stopping problem is impossible. An example of that is dead code identification. Sometimes programs get large, and they go through multiple generations of rewriting and modification, and eventually someone will look at a piece of it and say "I think that this code is never executed any longer." There are cases where this can be determined conclusively (e.g. if it's a function and there is no invocation of it, it clearly never runs) but sometimes it's not easy to determine. Suppose that the function does have one or more invocations, hidden inside conditionals. Then we have to determine whether that conditional will ever become true, which means figuring out whether other code will contribute to the state required. It's obviously a tough problem in some cases. Wouldn't it be nice if we could turn a computer program loose on it and have it tell us conclusively whether a piece of code is dead? Yeah. It sure would. Unfortunately, it's impossible because it is isomorphic to the stopping problem. Suppose that the piece of code you're trying to analyze contains the only halt instruction in the entire code base. If the code executes, the computer will halt. Will the computer halt? Bummer. It would have been so useful. But that's life. You may have read that they recently analyzed the entire human genome. It is a spectacular achievement, but I'm afraid that it was the easy part. The way a programmer would put it, they just managed to do a memory dump of the code. But analyzing it is a much harder problem, especially since they didn't dump the variables or stack or register contents -- in other words, a precise chemical description of all the proteins present in the cytoplasm along with quantities and current energy state of each.. One particular difficulty is that it's already known that upwards of 90% of the genome is dead code. So which parts are important and which are irrelevant or only of historical interest? Good question -- let's solve it with a computer, shall we? Sorry, no can do. You've run into the stopping problem again. The genetic code is isomorphic to a computer program for a multiprocessing computer. It doesn't have a single execution stream, but it has elements of a state machine. Each cell in our bodies which has a nucleus contains all the genetic information to describe every enzyme in our bodies, but not every cell activates all of the genes it carries. What happens is that some proteins in a cell turn on other sequences of genes, which create other proteins some of which activate still other sections of genes, and onward it goes. Over the life of a cell it may respond to an unusual situation by activating a gene sequence to create a protein it needs temporarily, then deactivate it again after the crisis happens. Other genes which are active in other cells of other types may not be activated by this particular cell. (Muscle cells surely don't need all the enzymes present in liver cells, for instance.) Out of a population of a million cells of a given type, only one of them may activate a given sequence once during its life. To determine whether a given gene will ever be activated, it may be possible to take a short cut. If, for instance, it can be proved that it doesn't contain a "start" sequence, a critical set of three codons which are always present at the beginning of a gene, then clearly it doesn't encode an enzyme. (It turns out to be TAC, which encodes for Methionine. All proteins contain exactly one MET and always at one end.) But suppose that it has a start, and also a perfectly reasonable activation sequence. The "activation sequence" is sort of like a subroutine header; without it the "code" can't be called. But with it there's no guarantee that it will be, just as the fact that there is a function call to a given function doesn't prove that the call will execute in actuality. This is precisely dead-code identification. Determining whether a given stretch of the genome encodes something useful consists of identifying all the parts of the genome which are useless (i.e. "dead") and then anything which is left must be useful. And we already know that modern digital computers cannot identify dead code because that's the same as solving the stopping problem, which we also know is impossible. That's not to say the problem is intractable, just that it can't be solved with the computers we currently have. What they can do, hypothetically, is to simulate five microseconds (or five years, depending on how powerful your computers are) in the life of the cell, and identify which genes are turned on and used during that interval. (Of course, chaos theory may also prevent this from being a perfect simulation, but that's a different discussion.) If the other problem is solvable mechanically, it will take a different kind of computer and a breakthrough in theory. It's not theoretically impossible for a non-Turing-Machine to solve the stopping problem, but right now no-one knows how it is done. In the mean time, if you read "90% of the genome is junk" or "There are only 30,000 genes in a human" then keep in mind that these are guesses; full analysis hasn't been done to prove it and may never be done. It's certain that most of the genome is junk, but we may never know exactly how much of it is useful. This page has been viewed 1813 times since 20010726. |