Translation Initiation Site Prediction |
|||||||||||||||||||
Biological Background |
|||||||||||||||||||
Biological Background |
The main structural and functional molecules of an organism’s cell are proteins. Another family of molecules, nucleic acids, carry the genetic information. The most common nucleic acids are deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). All these molecules are called macromolecules, due to their length. Both proteins and nucleic acids are linear polymers of smaller molecules (monomers). The term sequence is used to refer to the order of monomers that compose the macromolecule. A sequence can be represented as a string of different symbols, one for each monomer. There are twenty protein monomers called amino acids and five nucleic acid monomers called nucleotides (Table 1). DNA is the genetic material of almost every living organism. RNA has many functions inside a cell and plays an important role in protein synthesis. Moreover, RNA is the genetic material for some viruses. A sequence of nucleotides has two ends called the 5´ and the 3´ end. Moreover, it is directed from the 5´ to the 3´ end (5´® 3´). Table 1. The five nucleotides are characterized by the nitrogenous base they contain. Three of them are present in both nucleic acids, one only in DNA and one only in RNA
The Central Dogma of
Molecular Biology
Figure 1.The central dogma of molecular biology Translation Translation, usually, initiates at the AUG codon nearest to the 5´ end of the mRNA molecule. However, this does not happen in all cases. There are some escape mechanisms that allow the initiation of translation at following, but still near the 5´ end, AUG codons. One of them is leaky scanning, where the first AUG is bypassed due to inappropriate context. Another escape mechanism is reinitiation, where translation initiates at an AUG codon before the correct initiation site and ends by reaching a stop codon. Translation reinitiates when the true AUG codon is found. Sometimes direct internal initiation happens. In this case the ribosome directly attaches near the true AUG codon without any scanning. These mechanisms of the translation initiation process make more difficult the recognition of the TIS on a given genomic sequence. There are three different ways to read a given sequence in a given direction. Each of these ways of reading is referred to as reading frame. The first reading frame starts at position 1, the second at position 2 and the third at position 3. The reading frame that is translated into a protein is named Open Reading Frame (ORF). A codon that is contained in the same reading frame with respect to another codon is referred to as “in-frame codon”. The coding region of an ORF is bounded by the initiation codon and the first in-frame stop codon. The coding region is surrounded by non-coding regions called 5´ and 3´ untranslated regions (UTRs). The direction of translation is 5´® 3´. We name upstream the region of a nucleotide sequence from a reference point towards the 5´ end. Respectively, the region of a nucleotide sequence from a reference point towards the 3´ end is referred to as downstream. For example, the initiation codon is upstream of the stop codon and the stop codon downstream of the initiation codon. In TIS classification problems the reference point is an AUG codon. The above are illustrated in Figure 2.
Figure 2. The initiation of translation. The ribosome scans the mRNA until it reads an AUG codon. If the AUG codon has appropriate context, then probably the translation initiates at that site References
1. Kozak, M.: Initiation of Translation in Prokaryotes and Eukaryotes, Gene, 234(2), 187-208, 1999.
|
||||||||||||||||||
|