The block cipher Rijndael is designed to use only simple whole-byte operations. Also, it provides extra flexibility over that required of an AES candidate, in that both the key size and the block size may be chosen to be any of 128, 192, or 256 bits. (During an early stage of the AES process, a draft version of the requirements would have required each algorithm to have three versions, with both the key and block sizes equal to each of 128, 192, and 256 bits. This was later changed to make the three required versions have those three key sizes, but only a block size of 128 bits, which is more easily accomodated by many types of block cipher design.)
The original description of Rijndael is available at: http://www.esat.kuleuven.ac.be/~rijmen/rijndael/.
However, the variations of Rijndael which act on larger block sizes apparently will not be included in the actual standard, on the basis that the cryptanalytic study of Rijndael during the standards process primarily focused on the version with the 128-bit block size.
Rijndael is a relatively simple cipher in many respects.
Rijndael has a variable number of rounds. Not counting an extra round performed at the end of encipherment with one step omitted, the number of rounds in Rijndael is:
To encipher a block of data in Rijndael, you first perform an Add Round Key step (XORing a subkey with the block) by itself, the regular rounds noted above, and as already noted, the final round with the Mix Column step, as described below, omitted.
Each regular round involves four steps. First is the Byte Sub step, where each byte of the block is replaced by its substitute in an S-box.
The specification for Rijndael only provided an explanation of how the S-box was calculated: the first step was to replace each byte with its reciprocal in the same GF(2^8) as used below in the Mix Column step, except that 0, which has no reciprocal, is replaced by itself (since it isn't anything's reciprocal either, it is the only value not used, so that makes sense) then a bitwise modulo-two matrix multiply was used, and finally the hexadecimal number 63 is XORed with the result. (Not C6, x7 is the MSB, if the diagram in the specification appears confusing.)
The S-box is:
99 124 119 123 242 107 111 197 48 1 103 43 254 215 171 118 202 130 201 125 250 89 71 240 173 212 162 175 156 164 114 192 183 253 147 38 54 63 247 204 52 165 229 241 113 216 49 21 4 199 35 195 24 150 5 154 7 18 128 226 235 39 178 117 9 131 44 26 27 110 90 160 82 59 214 179 41 227 47 132 83 209 0 237 32 252 177 91 106 203 190 57 74 76 88 207 208 239 170 251 67 77 51 133 69 249 2 127 80 60 159 168 81 163 64 143 146 157 56 245 188 182 218 33 16 255 243 210 205 12 19 236 95 151 68 23 196 167 126 61 100 93 25 115 96 129 79 220 34 42 144 136 70 238 184 20 222 94 11 219 224 50 58 10 73 6 36 92 194 211 172 98 145 149 228 121 231 200 55 109 141 213 78 169 108 86 244 234 101 122 174 8 186 120 37 46 28 166 180 198 232 221 116 31 75 189 139 138 112 62 181 102 72 3 246 14 97 53 87 185 134 193 29 158 225 248 152 17 105 217 142 148 155 30 135 233 206 85 40 223 140 161 137 13 191 230 66 104 65 153 45 15 176 84 187 22
Note that 63 in hexadecimal is 3 plus 6*16, or 36+60 or 96, and that is 99, as begins the table.
Next is the Shift Row step. Considering the block to be made up of bytes 1 to 16, these bytes are arranged in a rectangle, and shifted as follows:
from to 1 5 9 13 1 5 9 13 2 6 10 14 6 10 14 2 3 7 11 15 11 15 3 7 4 8 12 16 16 4 8 12
Blocks that are 192 and 256 bits long are shifted like this:
from to 1 5 9 13 17 21 1 5 9 13 17 21 2 6 10 14 18 22 6 10 14 18 22 2 3 7 11 15 19 23 11 15 19 23 3 7 4 8 12 16 20 24 16 20 24 4 8 12 and from to 1 5 9 13 17 21 25 29 1 5 9 13 17 21 25 29 2 6 10 14 18 22 26 30 6 10 14 18 22 26 30 2 3 7 11 15 19 23 27 31 15 19 23 27 31 3 7 11 4 8 12 16 20 24 28 32 20 24 28 32 4 8 12 16
Note that in the 256 bit case, the rows are shifted 1, 3, and 4 places to the left, instead of 1, 2, and 3 places as for the other two block sizes.
Next comes the Mix Column step. Matrix multiplication is performed: each column, in the arrangement we have seen above, is multiplied by the matrix:
2 3 1 1 1 2 3 1 1 1 2 3 3 1 1 2
However, this multiplication is done over GF(2^8). This means that the bytes being multiplied are treated as polynomials rather than numbers. Thus, a byte "muliplied" by 3 is that byte XORed with that byte shifted one bit left.
If the result has more than 8 bits, the extra bits are not simply discarded: instead, they're cancelled out by XORing the binary 9-bit string 100011011 with the result (shifted right if necessary). This string stands for the generating polynomial of the particular version of GF(2^8) used; a similar technique is used in cyclic redundancy checks.
The final step is Add Round Key. This simply XORs in the subkey for the current round.
Although a three-dimensional color diagram of the round has already appeared at the beginning of the page, the Rijndael round can also be illustrated in a more conventional manner:

The extra final round omits the Mix Column step, but is otherwise the same as a regular round.
Thus, the sequence of steps in Rijndael is:
ARK BSB SR MC ARK BSB SR MC ARK ... BSB SR MC ARK BSB SR ARK
Because it begins and ends with an ARK (Add Round Key) step, there is no wasted unkeyed step at the beginning or end. The sequence of operations is important for facilitating decipherment, as well.
Although the sequence is not symmetrical, the order of some of the steps in Rijndael could be changed without affecting the cipher. The Byte Sub step could just as easily be done after the Shift Row step as before it.
That would change
A BSMA BSMA ... BSMA BSA
to
A SBMA SBMA ... SBMA SBA
If, on the other hand, we reversed the original sequence, we would get
ASB AMSB ... AMSB AMSB A
Although these look similar, except for the different positions of the spaces, which just mark what are termed "rounds" in the algorithm, wherever the sequence "MA" appears in the modified, but equivalent, algorithm, the sequence "AM" appears in the reverse sequence, required for decryption.
Of course, the steps also need to be reversed as well: ARK is an XOR, so that is its own inverse, but the matrix in the Mix Column step needs to be replaced with its inverse, and the S-box in the Byte Sub step needs to be replaced with its inverse for decryption as well.
What about the switch between "MA" and "AM", though? Do we need to change the order of operations for decryption?
No; with matrix multiplication, the distributive property also applies, so the round keys involved in such a reversal merely need to be multiplied by the (inverse) Mix Column matrix, and then they can be XORed in at the same time (of course, in reverse order) as was used for encryption. (XOR corresponds to addition in GF(2^8), which is where the multiplications are performed.)
The matrix for the inverse Mix Column step is:
14 11 13 9 9 14 11 13 13 9 14 11 11 13 9 14
as can quickly be verified by making use of its symmetry, and once the answer is known:
1110 1011 1101 1001 01 00 00 00 1001 1110 1011 1101 00 01 00 00 1101 1001 1110 1011 00 00 01 00 1011 1101 1001 1110 00 00 00 01 111 101 110 100 01 01 00 00 110 100 111 101 00 00 01 01 1100 1000 1110 1010 00 00 10 10 1011 1101 1000 1110 01 01 10 10 0 0 1 0 01 01 10 11
For keys 128 and 192 bits in length, the subkey material, which consists of all the round keys in order, consists of the original key, followed by stretches, each the length of the original key, consisting of four-byte words such that each word is the XOR of the preceding four-byte word and either the corresponding word in the previous stretch or a function of it. For the first word in a stretch, the word is first rotated one byte to the left, and then its bytes are transformed using the S-box from the Byte Sub step, and then a round-dependent constant is XORed to its first byte.
The round constants are:
1 2 4 8 16 32 64 128 27 54 108 216 171 77 154 47 94 188 99 198 151 53 106 212 179 125 250 239 197 145 57 114 228 211 189 97...
which are, in binary:
00000001 00000010 00000100 00001000 00010000 00100000 01000000 10000000 00011011 00110110 01101100 11011000 10101011 01001101 10011010 00101111 01011110 10111100 01100011 11000110 10010111 00110101 01101010 11010100 10110011 01111101 11111010 11101111 11000101 10010001 00111001 01110010 11100100 11010011 10111101 01100001...
the successive powers of 2 in the representation of GF(2^8) used. Successive powers of 3, unlike 2, include all the values from 1 to 255, and thus several implementations of Rijndael use tables of the powers of 3, and the inverse table giving the discrete logarithm in that field, to facilitate calculations, but the round constants are still the powers of 2. (I'm not entirely sure why one needs log and antilog tables when all one is doing is muliplying by 2 and by 3, but doubtless there is a nonobvious way to implement Rijndael with greater efficiency that makes use of them. The inverse of the matrix used in the Mix Column step has the values 9, 11, 13, and 14 as its coefficients, however, so this could simply serve to limit the number of tables needed for both enciphering and deciphering, replacing six tables by two.)
For keys 256 bits in length, in addition, the S-box from the Byte Sub step alone is applied to the word from the preceding stretch for the fifth word in a stretch in addition.
Although this page (and the preceding one) already have several diagrams of Rijndael, I've included yet another one, one even more simplified than the one appearing on the previous page, to make it easier to "see the forest for the trees", and see both that the Mix Column step indeed only mixes columns, and is the only place where that happens, and how the Shift Row step works with the Mix Column step to ensure that all parts of the block impinge on each other.
The standard techniques of differential and linear cryptanalysis can be adapted to be used against Rijndael. Because of the way matrix multiplication works, and because in GF(2^8), all the coefficients of the Mix Column matrix (as indeed all numbers from 1 to 255) have reciprocals, a specific attack, originally developed for use against its predecessor Square, called the "Square attack", can be used as well.
If one uses 256 blocks of chosen plaintext, where every byte but one is held constant, and that one is given all 256 possible values, then after one round of Rijndael, four bytes will go through all 256 possible values, and the rest of the bytes will remain constant. After a second round, sixteen bytes will each go through all 256 possible values, without a single duplicate, in the encipherment of the 256 blocks of chosen plaintext. (For a 128-bit block, this is every byte; for larger blocks, the rest of the bytes will remain constant.) This interesting property, although not trivial to exploit, can be used to impose certain conditions on the key when one additional round, before or after the two rounds involved, is present.
The possibility of this attack was first noted by the developers of Square and Rijndael themselves, and was noted in the paper that initially described Square.