THE POLITICAL ECONOMY OF MODULAR ARITHMETIC An account of the attempted factorisation of the RSA-numbers 3 OCTOBER 2004 I have been trying to get \$600000 for four months or so. Du Sautoy's book [1] must have had some influence on the decision to invest the time. The RSA website looks reasonable. They are supposed to be a company with some money, but anyway the interesting thing about the factorisation challenge is the way that e-commerce would feel some tremors. The money offered is real money. The public domain source code for d4maths is widely avaialable, and cached in many places, but of course someone else could produce factorisations running the software being used here. It is also possible that large teams operationg number field sieve will get through to these numbers, but they are still stuck with an #exp(#sqrt(#ln(n))) complexity which is definately sub exponential growth, but not much. It is still worse than any power of #log(n), so factorisation is still an NP problem, provided that this is the best method. The same sort of estimate comes up in the prime number theorem. pi(x)= integral (1/ ln x) + error(n). The error term found over one hundred years ago is the quantity n * #exp(-#sqrt #ln n). To this day no one has pushed exact estimations of pi(x), the number of primes less than x, to beyond x in the twenty to thirty digit range. There are ethical concerns in being involved in the security industry. When someone wants to keep something secret they aften want to deceive. Policing the communication channels has always been important in the growth of the state. There are many popular books on ciphers and code breaking not to mention reminiscences of those directly involved. The basic steps of public-private key encryption and key exchange were worked out in the late 1970s well before the growth of the internet. Electronic security has good sales pitch. The expert could argue that Germany had lost two world wars because of poor communications security. Next came the Cold War and all those stories of public school educated cypher clerks subourned into sexual adventures and treachery. Both Ludlum and Le Carre represent many of the organisations which invest heavily in electronic security gadgetry as ridddled with incompetance and beaurocracy as well as conspiracy and interservice rivalry. The salesman simply has to tell the army that the navy has ordered something better and the army will want the latest. The trouble about electronic security is that it is not fool proof. The mere fact that the industry is so big confirms that the world is run by fools. Michael Moore may call them stupid white men, but of course some of the underlying stupidity has been adapted by all races. The class group which Moore describes includes many with high electronic security budgets. They didn't have enough Arabic translators to read internet chatter about 9/11 until well after the event. In the UK there have been demonstrations outside an American listening station in Yorkshire. Places like the listening station, and GCHQ in west England are part of an eavesdropping infrastructure which is meant to defend us all from terrorism and hostile foreign powers. These defenders of our liberty can be compared with the crew of a sinking ship manning hand pumps to stop the wreck from sinking. They are suffering information overload, while given yet more targets to monitor. I once drove up to the Langley headquarters of the CIA although in a somewhat furtive fashion. I was not sure if I would be confronted by armed guards or something. I just took the turn off that said CIA and drove up to a turning place and went back to the freeway which follows the Potomac, going North out of the city, hoping not to be persued. Later on I read that someone drove up near there to fire shots at employees as they arrived to work. The 1990s saw the privatisation of the internet. Internic went from the national science foundation to a private company. Very quickly the world has adapted a center periphery topology with the USA at the center, but a strong periphery along the Pacific, and North Atlantic. China and India are already competing for clerical or manufacturing jobs in the center. By the time I visited Washington I thought the place a dump. I had seen the Excorcist with its visions of incompetant but expensive doctors and general pent up mayhem willing to be unleashed on the mind of an innocent young girl whose mind had probably been emptied by TV advertising. There seemed to be a large black underclass which might get jobs sweeping up in the Pentagon or some other institution, if they were lucky. Dulles airport had seemed like a seedy run down bus station on the outskirts of a Third World city. In the capitalist USA they don't go in for lavish subsidies to airports and airlines as in most of the rest of the world. Airports around the Pacific, in Europe, and in Saudi Arabia had seemed much better than Dulles. In 2004 the business of subsidising airlines has become less fashionable world wide, Dulles was rare on potted plants and convenient restaurants on simulated terraces. The disembarkation trucks seemed old fashioned. I went on and ended up staying the first night in the Ramada Hotel. This was the same hotel which had seen a Washington politician (Marion Barry))busted for sniffing cocaine and cnsorting with prostitutes. The hotel also had a bible in the drawer. Before I had seen the Koran or perhaps the sayings of Buddha, in the hotel bedrooms. Welcome to America, land of the Gideon Bible and the porn channel. It's no surprise that Sayyid Qutb was willing to mock the place. Before I arrived in America I only knew about it from books and films. Philip K Dick had written about places run by a ferocious military industrial complex willing to enslave more and more of the people in a relentlessly stupid persuit of unattainable objectives. American presidents came out of the laboratories of commercial technology companies that could shrink time and space, and satisfy certain levels of consumption. These commercial companies could distort reality in many ways including the use of drugs, electronic stimulation or ordinary use of the mass media. Other influences were the French connection and Clint Eastwood. The French Connection showed people living in poverty. Many drug dealing films such as Superfly showed poverty. Noam Chomsky wrote about poverty and deception. Washington was a center of deception. Lobbying firms would try to get the government to spend money on subsidised projects ranging from anti-malaria pills to x-ray lasers and space based defense systems. Running the world was part of this delusion. The technology is so good that people think they can run wars from video consoles in the leafy suburbs, but if the technology is really that effective they could outsource and run the wars from cheaper countries. Philip K Dick mentions running wars from video consoles, but the idea has also appeared in one of Dennis Wheatly's books. A group of satanists ensconced in Antarctica were trying to help Hitler and Imperial Japan, and their efforts were effected with the help of telepathic zombies sitting at consoles. The satanists were foiled because a British man and his wife got washed up in Antarctica by accident and they eventually got the upper hand over the satanists. The idea of popular data encryption is to allow people to spend money on the internet. Public-private keys may be created on demand. The first round of RSA style encryption was widely available by the early 1980s. The Rivest, Shamir, Adelmen (RSA) method was published in the late 1970s. The basic concept was trapdoor functions. It is very easy to fall through, but it is almost impossible to get out. You can very easily write your secret in a code that no one can read, and then put it in a public letter box to get read by the person who would either benefit from that secret, or just use it to authorise a payment. All of this may come as a surprise to those who regard the banking system of America as one of the most inefficient in the world. Long after the invention of the RSA system, interstate checking accounts were very rare. Many banks were very local affairs with often ramshackle financial foundations. Many banking functions were performed by credit card companies and these handled most transactions. American commerce has lead to the demands for secure communications to validate payments. Key distribution had always been a disadvantage in earlier forms of encryption but trapdoor functions got round this difficulty. The first trapdoor functions were brought from the theory of optimisation. This line of thought was quickly abandoned when the RSA method came along. A newer version of encryption is the Elliptic Curve method. Both of these methods give trapdoor functions which are designated jumps in a box. You can jump somewhere, but you cannot get back to where you were last move without a magic key. In practice the RSA algorithm takes significantly more time for decryption than encryption. Decrypting the RSA algorithm is equivalent to solving the discrete logarithm problem (DLP). Back in the 1970s people were working towards a Data Encryption Standard (DES). This only gave 64-bit security and there was already a debate as to whether this was good enough. The government had already put encryption software on its list for export controls. IBM was involved. Control of electronic communications was important and costly. Closely linked with the discrete logarithm problem (DLP) is the science of pseudo random number generators. D.E. Knuth devotes half a book to the topic in his treatise on computer programming [5]. Pseudo random number generators are quite good for commercial encryption. In spy stories the secret agent encodes the message as numbers, and then adds a number from a code book to each digit before transmission. The recipient, who has access to the code book is able to recover the original message by subtraction. The length of the encryption key is as long as the message. The code books can easily be replaced by pseudo random number generators. These are alorithms which churn out streams of random looking numbers. Most early random number generators used modular multiplacation: X[n+1] = X[n] * A (#mod M) where A and M are suitable chosen constants. The modern commercial application of the one time pad is television. Before transmission of the program each scan line is rotated by a certain amount, depending on the particular scan line. The receiver is able to subtract the offset because the user installed an electronic card which knows the decoding algorithm. 5 OCTOBER 2004 Modern cryptography texts always talk of Bob and Alice. This is because the letters a and b can be used in algebra. RSA works like this. Alice keeps a triple (E,D,N) identified with a suffice as belonging to Alice. Bob has a similar triple (E,D,N). Bob wants to send a secret message x to Alice, using Alice's public key (EA,NA). Bob computes y = x^EA (mod NA). Alice decrypts by computing y^DA (mod NA) and gets back x, the plaintext. Here the symbol x^n (mod NA) stands for modular exponentiation; that is to say, x is raised to the n-th power modulo NA, where NA is Alice's public key. Normally these keys are hundreds of digits long. Computations are tedious. Some of the technical details can be found as an internet document called rfc2313.txt. This document contains references going back to the 1970s. RSA laboriatories is somewhere in Massachussetts. The people who write this sort of stuff are trying to fashion industrial grade algorithms. Industrial grade prime numbers are not really discussed in rfc2313, although implementation details are there. The British claim that government sponsored researchers came up with pretty much the same thing back in the 1970s but were prevented from publishing by government secrecy regulations. The essential RSA algorithm was fairly hard to patent at the time because lawyers striking comical poses to try and patent large prime numbers could be compared with previous idiotic attempts to patent incorrect values of pi. Nowadays the patent lawyers are casting out for government and legislature support in patenting algorithms including many with a history of thousands of years. Software patents are likely hinder opportunities of developing new technology to all except those in the employ of large corporations [6]. The RSA algorithm was put in the public domain via the ACM and then the Internet. It found its way to industrial standardisation documents. The details were communicated via open channels. Rivest, Shamir and Adelman got people to listen to them because of some sort of trust factor. This trust factor was essentially verification that their work was correct by use of well known theorems going back to Fermat in the 1600s. Legendre and Euler had also contributed to the mathematics. 6 OCTOBER 2004 The RSAC prizes. name value digits digit sum RSA-640 \$20,000 N 193 806 RSA-704 \$30,000 N 212 1009 RSA-768 \$50,000 N 232 1018 RSA-896 \$75,000 N 270 1222 RSA-1024 \$100,000 N 309 1369 RSA-1536 \$150,000 N 463 2153 RSA-2048 \$200,000 N 617 2738 Total \$625,000 Crandell writes that the Avagadro number, about 10^24 represents a good count of all basic computer operations used in recorded human history. Going on from this he suggests a brute force attempt to factorize numbers larger than 10^50 is equivalent to replicating all computer operations carried out up till 2000. He notes that the Disney/Pixar animation, "Toy Story" was worth about 10^17 processor operations which is comparable with the 10^18 processor operations required for epic factorisations: numbers with around two hundred digits. 8 OCTOBER 2004 Crandall and Pomerance must go back to the library. Internet research shows that America's National Security Agency is better guarded than the CIA. The NSA also has it's own turnoff from the highway system, but there is a notice warning casual visitors not to take that exit. The initials NSA apparently mean Never Say Anything, and the NSA is supposed to be behind attempts to dumb down the Data Encryption Standard (DES) from 128-bits to 56-effective bits. The idea was that the NSA wanted the right to eavesdrop on commercial traffic and break industrial security codes by trying out every possibility with its banks of supercomputers. The NSA machinary was too slow to pick up the student references to the targets of the nine-eleven attacks. Apparently they had been more busy preventing the export of encryption expertise from the USA rather than checking out what people were saying in student slang. How different from the British. We expect our intelligence experts to finish off the Times crossword (composition now outsourced to South Asia) in two minutes, and then go on and pick out plots from all parts of the world which can threaten national security. Meanwhile the top managers will be told what to dig up on the internet about Saddam's weapons of mass destruction. What's more the British won't try and ban the export of cryptography software, because they will not show the existance of something by trying to ban it. later .... The view from my flat is different today. They have demolished the last of the Claywood Towers. Previously I had looted furniture and fittings from these derelict blocks. A breadknife, coffee table, wine glasses and chinese bamboo blinds rewarded my efforts. later ... There is much to do. The main factorisation methods of today are the Quadratic sieve, and the Number Field Sieve: I have yet implement either of these. The programs seem tricky to write, and sieving time is to be counted in months for 200 digit numbers. REPUNITS OR NUMBERS MADE UP OF ONES One of the most stupid looking methods of factorisation of the number N involves finding the greatest common divisor (GCD) of N with the numbers 1, 11, 111, 1111 etc. These numbers are called repunits and eventually every odd prime number will divide these things. N itself will be a factor of a repunit. If you could work out how to get the product of 2^k repunits mod N in polynomial time in k then the factorisation problem would be solved. You can certianly find the 2^k th repunit mod N in time proportional to k. The problem arises in accumulating the product. 9 OCTOBER 2004 Yesterday I visited lowtech media lab. Alex, a placement worker seemed interested in the idea of continuously running a program in an attempt to get money. I also looked at the weblogs and saw a couple of enquiries from the Rochester Institute of Technology (RIT) concerning the random squares method. During the night I started the matrix generation phase for this method. The GF(2) linear algebra is the next step. There are already a couple of 80x80 matrices for small 10-15 digit numbers. RANDOM SQUARES METHOD Fri Oct 15 10:15:44 BST 2004 This is simply stated. Given a number N to be factored collect a set of quadratic residues mod N which are themselves the product of small primes. Use the set of factorisations X[i]^2 = product q[i,j] to generate an equation X^2 = Y^2 mod N, and then test gcd(X+Y,N) and gcd(X-Y,N) for factors of N. The cases X=Y are called trivial factorisations and they are not so interesting. Sat Oct 16 09:34:39 BST 2004 Today I must return Silverman's Arithmetic of Elliptic Curves to the library. It is a day late. The book has obviously been kept in the damp in the past. It is on interlibrary loan from York University. York is on a flood plain, and over the last few years there have been spectacular floods in many neighbourhoods of York. Pundits say it may be a sign of global warming, while others say the floods are related to overstocking of animals on the Pennine watersheds. I have constructed an implementation of the random squares method, but the method of picking out smooth numbers is very slow. The program would take about 100 days continuous running to factorise a 20 digit number. 11 digit numbers can be done rather faster. LINEAR PROGRAMMING AND MAXIMUM COST MIX Mon Oct 18 11:48:33 BST 2004 During the 1970s I worked as a computer programmer in Sheffield. The employer was a Sheffield steel group which leased space in a building now functioning as the Bristol Hotel. The task was to write a system which would run locally as a substitute for a networked system which was already in use, back in 1975. A computer in America worked out how much special ingredients had to be added to molten steel in order to meet specification and minimise costs. The American computer was able to access melt specifications via a link to instrumentation on the shop floor. In two years working for the company I visited the melting shop about twice. The company already had a working program available, but this program was written for the wrong machines. Most of the working day was spent at the office and during lunchtimes there were drinking sessions at the pub over the road. Sometimes I would vary the routine and slip off the the Anglican Cathedral, to lounge about in the pews and pretend to be meditating. More recently I have found solace in the Catholic Church on Norfolk Row. It's possible to go in and buy candles, light them, and then place them on holders in front of an altar, and meditate for people's souls. It's like temples over the world: the Sacre Couer of Montmartre, some Shiite mosques in Iraq and Iran, and Hindu and Buddhist temples anywhere. The link between religion, candles, and ritual comes from before the days of the electric light. The least cost mix project was dubbed 'Maximum Cost Mix' by many in the office. I tried to make some of the old programs work, but the source code produced a six inch thick printout of assembly language, and the first part of the user dialogue was the intriguing question "file name". The programmer was a brilliant refugee who had fled Communist cppression in Czechoslovakia, and had just recently fled the steel group to work in public service for some marine dockyard authority in the North East. The old programs were written for ICL 1900 computers, and designed to run via MAXIMOP, the then favoured network operating system. MAXIMOP was an acronym for Maximum Online Programming. The ICL compatible programs were written in PLAN. The actual computing work was extremely tedious and boring. The theory was rather more interesting. Linear programming was developed in the 1940s as part one of the scientific management techniques later oversold as the recipe for winning the second world war. Personnel selection could allow the army to allocate the best men to the most suitable posts. A German speaking intellectual such as Henry Kissinger would get a job as chauffeur for the top brass while businessmen would be assigned to jobs suitable for personal enrichment in logistics and scientists might get jobs designing and building mighty weapons. America certainly was building the information infrastructure necessary for a Marxist-Leninist style command economy. Linear programming can be used to devise ways of keeping the right proportion of amino acids in a scientific diet. You have different ingredients with different costs and different ratios of amino acids. Certain thresholds should be exceeded for a balanced diet, but it is also possible that some quantities should not be exceeded. Pet cats and dogs benefit from this science. Humans in the richer countries tend to use irrational criteria or perhaps market forces just go for 'maximise junk, maximise profit' as a model. Suppose you have ingredients J, with cost c[j] for ingredient j. Each ingredient contains A[i,j] percent of amino acid I. The constraint values for each amino acid i are that the total amount b[i] must lie in some specified range. The problem can be written in matrix algebra terms. Maximise (c.x) given Ax <= b Here c.x is sum over J of c[j]x[j] while the expression Ax is the matrix vector product whose row sums are sum { A[i,j]x[j] ; j in J} Linear programming is tedious and tricky to put on a computer, although it is possible to work from scratch and develop a system in a few days now. The the main difficulty behind linear programming is matrix generation. The story of amino acids and balanced diet took several decades to work out. It is just the same as estimating marginal costs in business. Nevertheless, linear programming is still used in fields such as portfolio selection or product mixing. There is a large literature in the field. Integer linear programming with the constraints that some variables are integer is still a recurrent problem. The first trapdoor functions were envisaged as a solution to a class of knapsack problems which are integer linear programs with a single constraint: maximise c.x given a.x < b. Exactly the same considerations apply to some factorisation algorithms. Both Dixon's Random Squares algorithm, and the Continued Fraction Algorithm use techniques of linear programming. Matrix generation may take several years, while the set of equations is so large that calculations are daunting for all but the most powerful computers. The matrices involved have millions of rows and columns, and team efforts usually end up with the linear programming phase done one some monster with access controlled by advance booking. How does linear programming get involved in factorisation methods ? The answer goes back to Fermat and Gauss. Fermat determined that some numbers could easily be factored by writing 0=x^2-y^2 mod N. If different x and y which can be found which satisfy this equation then x+y and x-y will be small multiples of factors of N. If N is odd for example N=ab, then (a+b)^2-(a-b)^2 = 4ab = 4N, so x=a+b and y=a-b are non trivial solutions of the congruence. Fermat knew this, and he started testing values x^2-N to see if they are perfect squares. These values could start at just after #sqrt N and a factorisation might be quickly found for numbers with almost equal factors. Waiting to find a perfect square in the series x*x mod N can be tedious, and usually far slower than trial division. Gauss improved Fermat's method by suggesting that a solution to x^2=y^2 mod N could be found in parts. All the numbers x*x (mod N) could be factorised as they were generated and then some of the equations could be multiplied together to get a couple of numbers X and Y so that X^2=Y^2 mod N. Instead of factorising all numbers of the form x*x mod N, or quadrating residues as they are called, only those numbers made up of a small number of factors are stored. These numbers are usually prime numbers up to some limit B. This is called a factor base. Numbers made up only of factors less than B are called B-smooth numbers. Dixon's Random Squares algorithm requires the building of a table with the following columns: x, a=x*x mod N, and the exponents of the factors of a in the factor base. If some of the prime factors of a occur in even powers, then these can be divided out from x and a. You then get a list with items:- ``` M= 422941 b= 40 [ 12] X X*X binary-vector 62767 874 o......oo... 247763 1547 ...o.oo..... 249270 1767 .o.....o..o. 421348 3 .o.......... 6031 35 ..oo........ 33730 1610 o.oo....o... 70881 22 o...o....... 74605 2465 ..o...o..o.. 1 1 ............ 412908 1131 .o...o...o.. 74950 138 oo......o... 306144 195 .oo..o...... 216434 19 .......o.... 220946 15873 .o..oo.....o 195995 759 .o..o...o... squares 307 matrix 15 12 ``` Next the program has to seek relations between rows. The trivial factor x=1 y=1 is found in row eight, so the program goes and tries this four times by including most of the other rows and attempting to perform an easy to program matrix inversion. The transcript of the full factorisation looks like this: ``` nr= 15 nc= 12 dm= 0 dependency vector 0 0 0 0 0 0 0 0 1 0 0 0 select rows 8 Z= 422941 1 1 1 nr= 15 nc= 12 dm= 1 dependency vector 0 0 0 0 0 0 0 1 0 0 0 0 select rows 8 Z= 422941 1 1 1 nr= 15 nc= 12 dm= 2 dependency vector 0 0 0 0 0 0 1 0 0 0 0 0 select rows 8 Z= 422941 1 1 1 nr= 15 nc= 12 dm= 3 dependency vector 0 0 0 0 0 1 0 0 0 0 0 0 select rows 8 Z= 422941 1 1 1 nr= 15 nc= 12 dm= 4 ``` At this stage the computer has only identified x^2=y^2 with x=y=1. The matrix generation phase should filter out such trivial factorisations. Next the program shuffles the rows using a normal random shuffle algorithm. This time a relation is found almost immediately. ``` nr= 15 nc= 12 dm= 0 dependency vector 0 1 1 0 0 1 1 0 0 0 0 0 select rows 1 2 5 6 Z= 422941 122110 4830 577 ``` In fact the full factorisation of 422941 is 577 733. The whole algorithm seems thousands of times slower than trial division, but the experts say that it is better for larger numbers. The business of randomising the rows of the matrix is something which could be avoided by linear programming. The random squares method is so slow to find good quadratic residues that it would be worth integrating the business of matrix generation with the generation of dependencies as early as possible. After the random squares method comes the continued fraction method. Here some ingeneous tricks allow for the generation of many quadratic residues mod N with only half the digits of N. The ease of getting quadratic residues comes at some cost; sometimes the continued fraction sequence will start to repeat or more likely there will be many trivial factorisations. There is every difference between a good and a bad matrix. The website for nfsnet.org speaks of factorisation efforts on some numbers being abandoned, or restarted after the matrix generation phase failed to produce a useful set of relations between the rows (that is, a set of rws whose sum is zero). All matrix operations are done over the field GF(2), the field with two elements 0 and 1 and the arithmetic defined modulo 2. PAYROLL CONVERSION PROJECT Mon Oct 18 19:47:09 BST 2004 Henry called. Henry sells the Big Issue in town. He gets hassled by the local smack & crack dealers because his face is known. He is a victim more than a criminal, but he has spent time in jail. He was in jail during the 1991 Gulf War and he claims that the jailers called out for recruits to join the struggle against Saddam Hussein. Despite his good qualities Henry did not get early release that time. Henry saw me behaving really stupidly. I had just installed CFRAC and Random Squares on the laptop, but the programs did not work. During most of the time Henry was there he saw me looking at the screen and heard me complaining about compatibility problems between Linux and Windows. After Henry left the house I examined the problem, and saw that there are both 16 and 32-bit versions of the run time system for the factorisation programs. Since a month ago I have just been running the programs with the 16-bit version. The difference between 16 and 32 bits goes back to the 1950s. It's similar to the byte-sex problem which can cause data to be lost, or misinterpreted. All these problems relate to the representation of data in a computer, and its addressing. 16-bit computers store most numerical data as two bytes. These bytes are contiguous in computer memory, and store the data at locations a and a+1 where a is the location of an integer, or number. These numbers go from 0 to 2^16-1 with a special convention that numbers greater than 2^15 are considered negative. 32 bit computers use 4 bytes to store integers. The range is 0 to 2^32-1 with nearly half of these numbers being negative. Textual data is just stored as arrays of bytes. Modern computers tend to start storing data at locations whose addresses are powers of two. These powers of two may go up to values as high as 64. Address space in memory tends to stay below 2^32 but remember that a 66000 x 66000 matrix will be tricky to handle in this context. Things go wrong when assumptions are made about the location of particular bytes in arithmetic. A 16 bit number such as 258 is stored over two bytes: LO HI .-------. Intel | 2 | 1 | 258 = 2+2^8 .-------. HI LO .-------. Motorala | 1 | 2 | 258 = 2^8+2 .-------. Getting round this problem was one of the most messy parts of developing the internet. The business was worse with 32 bit machines. Intel kept its standard, so a 32-bit number would get stored something like N = HI,LO,HI,LO where the most significant bits of an integer may not lie close to its byte address. Byte sex problems occurred during the time I was trying to convert a payroll in Paris. The company was called SLIMOS, and the payroll conversion started in an office belonging to Credit Lyonnais near La Defonce in the western part of the city. During one morning going to work my journey was briefly interrupted by a death by suicidal jumping, or pushing from the crowded metro platform into the tracks of an oncoming train. I was to squeamish to look at the mangled body. The original payroll was written for a french machine, and the idea was to convert it to run on a PDP-11 running a clone of the RSX operating system. The salesmen had told the customer that the machine could run programs written in COBOL. I was hired to move about 50 or 60 COBOL programs over the hardware platforms and operating systems. The man who hired me was a former French paratrooper who had obviously been in Algeria at some time. France was full of ex-colonial people who had had to come back from the countries of the former empire. Of course that number included many Africans and Asians who had directly benefitted from French colonialism and they came along to enjoy the benefits of francophonism as long as possible. During the 1980s I thought the 13 arrondissment tower blocks full of Viet-namese very civilised places. Monsieur Gerain, the former paratrooper, had written true bloatware. The COBOL programs were so long and complicated that the PDP-16 compiler would often give up and vomit streams of spurious errors. Unfortunately the screen coordinates were stored as pairs of bytes because no-one thought a screen would have more than 256 x 256 characters. Indeed current reality is closer to 10 x 40 or perhaps 100 x 250 pixels on a mobile phone. The whole business of byte sex caused the cursor positioning routine to fail because COBOL compiled structures differently on different machines. In the end the project was abandoned, and I lingered a short while in the office before getting fired. The other programmers would sometimes call me Shakespeare because of the way I spoke. Monsieur Gerain was a barbouze, a man with a beard who had been in the French military in Algeria. The company for which he worked was part of a network of companies set up to serve the French military industrial complex. I remember once going to the offices of another company in the same economic niche. The place was in a suburb of Paris and on the way there I noticed many police and military standing around for some V.I.P. convoy. France seemed very militarised during the 1970s and 80s. The top officials seemed to command fleets of cars just like Saudi princes. The top assets of the state become personal fiefdoms of the rich and later infamous. A payroll program should be simple to implement. Later on in Thailand I saw students write payroll programs for laptop computers in days rather than months. These students could program the computers in BASIC. That was much easier than COBOL, and byte sex problems would generally not happen unless you wanted to use image dumps or write machine code video drivers. A payroll program could be done with any decent table handling language. The check on the growth of a payroll program needs to be achieved by a sensible negociation with the client. In practice most clients are treated as morons by professional software companies. They add bits and pieces to a program if the client wants more, even though the client may end up getting an unworkable product. XEDNI CALCULUS Mon Oct 18 22:04:58 BST 2004 Some Internet cryptography groups talk about XEDNI calculus. This is a term used to describe certain attempts to solve the discrete logarithm problem, but the general methods may be applicable to other problems including the recognition of certain classes of numbers. XEDNI is Index spelled backwards. The essential idea is to trade off time against space. Use huge arrays for hash tables and save time. It's quite likely that many computer algorithms can be speeded up a lot by storing common results and looking them up from tables. Modular inversion with respect to an integer is a good example of potential speedups. Other approaches would include just in time evaluation of algebraic expressions. The ordinary random squares algorithm is a case in point. New values of X and X*X mod N can be entered in a hash table so that repeated values are found quickly, and matrix algebra could be made uneccessary. CFRAC TEST Tue Oct 19 03:14:02 BST 2004 H<-MK_PQ 30 G<-CFRAC"-v -b 2000 ", H Construct the number N = 27446761927390629967746984179 as a product of two primes, then attempt factorisation. The continued fraction algorithm ran for twenty four hours before producing a 308 x 303 matrix. After about twenty minutes or so it found the following factorisation, as a difference of squares, mod N. The first six eliminations just gave the same trivial factorisations. Randomization torned up a dependency with 42 rows. These rows gave two different values x, y with x^2=y^2 mod N. x=14393427734175655991914869795, y=24596980067594104366400046908 gives a factor GCD(N, x+y) = 13513279008157. The x value comes from multiplying all values in the X column for the selected rows, while the y value was computed by multiplying out all non zero elements of the factor base from the selected rows and then taking a square root, and finally reducing this mod N. The square root extraction is totally uneccessary: it is possible to collect a prime factorisation of y^2, and then write y as a product of prime powers taken from the factor base. The whole process of solving linear equations by subtracting rows is called Gaussian elimination. Gauss made his way into the English language as a unit of measurement for magnetic flux, and later as a negative verb: to degauss. Gauss was a brilliant mathematician and a hard working scientist in astronomy and physics. One of his most famous discoveries was N=delta+delta+delta; any number is the sum of three triangular numbers. As a young man Gauss proved the Law of Quadratic Reciprocity, which concerns numbers which are squares modulo some other number. ATTACK BY ZOMBIES Tue Oct 19 20:41:13 BST 2004 Today I was assaulted by a couple of women. Arrived at Park Hill at about 18:00. Said hello to Jane Croft and her son as I passed along the balcony to my flat. I commented on the weather. As I turned the corner of the balcony I noticed two women waiting in an alcove. One was blond, but unattractive. The other was more well built with darker hair, but also unattractive. I went to my door. The blond woman propositioned me for a 'good time'. I said I was happy enough without her, unlocked the door, and pushed the bike into the lobby. The blond woman tried to follow me and when I resisted she said she was pregnent, and accused me of hitting her. We went onto the balcony in a sort of scuffle, and she called her friend who by this time had started to walk away down the balcony. The friend ran back down the balcony and they assaulted me. The blond one started demanding money. while they both tried to stop me going into my flat and closing the door. I started shouting "Help ! Help !". No one was around. Then I pretended to be dying, shouting "My heart, My heart !", -- but I also continued calling for assistance. After that I was rolling on the floor while they were trying to access my pockets. At some point the blond woman shouted that I had stolen money from her, and kept demanding money, so I gave her 50 pence. While I was on the floor being assaulted an Asian man passed, but did nothing. Eventually my shouting and struggling put the women off continuing the assault. At one point they tried to push me into the flat to continue the assault there, but that did not happen. They ran off when it appeared that others were approaching the balcony. At around 6:15 pm Jane Croft and her son arrived at the balcony. Jane is head of the tenants association, and we have briefly spoken to her at tenant meetings. She had been moving out furniture for her mother who is leaving the estate. She and her son had heard my shouts from the car park, and had come to see what was happening. After I picked myself I found my money and mobile phone and everything else was intact. After speaking with Jane I rang the police at 999 and after saying it was police emergency and police service I wanted as opposed to fire or medical emergency the police switchboard did a very competant job in logging the incident. Shortly after two armed police with flak jackets, and guns arrived, along with a big Alsation. During this time I was phoned up by Henry who I had been expecting, and mentioned the assault. During this time the police pointed to various people walking in the distance, and asked if I recognised them as the assailants. They also tried to get accurate descriptions of the assailants: accurate decriptions which I could not give. The first police to arrive had obviously been diverted from more serious business. I mentioned the bite on my hand, and asked if forensics could investigate the wound to get a DNA sample from the assailants saliva, but one of the policemen told me that such tests were too expensive for this sort of case. Eventually they told me to go in and wait, so I invited Mark and Henry down and I played backgammon with Henry until the next lot of police arrived. This time there were two men and a woman. We discussed things standing in the kitchen while Mark and Henry sat in the front room. One of the men wanted to downplay the incident and told me that it was the lowest level assault, and not really an arrestable offence. The fact that I was standing there, had called for assistance on my own mobile phone, and able to walk and talk showed that no harm had really been done. I had lost nothing. The fair haired policeman had made some issue as to why I had given the 50 pence to the women. In the end I said I had given it to insult them. He had written something down in his notebook and asked me to sign it. Here I sat down to read it, which he did not really want me to do, and then I started arguing about one of the words: 'confused' to be replaced with 'surprised', and also I insisted that they logged the biting in the incident. The dark haired policeman had been quite reassuring to me. He told me that the bite would not give me AIDS when I had mentioned the possibility. At the end stage the dark haired policeman let me look at the incident report sent out from the control room, and this indeed did mention blood. The police left, and later on I saw Henry and Mark out of the door. Henry spotted a 50 pence piece on the ground, so in the end I had lost nothing at all. The whole incident is a good example of the lack of respect for people which afflicts large areas in contemporary Europe. The lumpenproletariat are always around, and there are many networks of ignorance and stupidity where people learn abusive behaviour from each other. Of course the rot goes right to the very top. DEMOLITION JOB Fri Oct 22 11:40:12 BST 2004 This morning I tried putting the factorisation programs into cgi-bin so that they can be run from a web page. The routine takes a seed number and the number to be factored. Unfortunately the script works incredibly slowly when run via the apache server and the loopback address at 127.0.0.1. Small numbers are OK, but numbers that require time seem to cause the whole screen to freeze, and the disk drive seems to engage in a spectacular flurry of activity, almost as though trying to self destruct. It seems that the Apache server itself may need some extra things in the configuration file to give enough priority to the factorisation. The alternative would be to have factorisation running as java applets on the client machine. This is the best way really since it uses time on the client machine. Another method is trial division in a javascript. I am sure these things have already been done. In the meantime our great leaders are doing their own types of demolition jobs in Iraq, and condoning similar operations in Gaza. The spectre if 'International Terrorism' excites the flatterers and toadies who try to control access to the leaders therevy ensuring that they will hear only what they want to hear. Salam Pax, an Iraqi, describes a visit to Washington. He met Americans who knew his own country as well as Salam himself. These people gave him their time, although the leaders very seldom spoke to mere experts low down in the intelligence-business-government heirarchy. Iraq is a morass for the high expectations of the neo-cons. Software is similar. Many don't work government projects are given the go ahead despite technical infeasability because expectations are raised between the programmer's prototypes and the salesman's perfect solution. LAPTOP COMPUTER Sun Oct 24 13:04:57 BST 2004 Saifon has just got a Toshiba Laptop to help her in her business. I just received the first call. It's hard to explain some things over the phone. The problem is the upper case numeric keys. It's here that the UK keyboard differs most from the American. Dao put me onto another person who was trying to help her. I told him that the American keyboard was the most useful for electronic commerce and not to bother setting the thing up with a UK keyboard. This whole document is being typed with the use of American layout because most UK layouts put the programming symbols on keys that may be completely omitted on UK keyboards. The vertical bar (|) is particularly annoying. It is used as a pipe symbol in MS-DOS commands, and the bar represents 'logical or' in languages such as C. The 'tilda' (~) is also hard to find on UK keyboards. The man had been talking about reinstalling the operating system, but that sounds like massive overkill. The trouble is that many consequences follow from keyboard selection, and certainly with many flavours of linux the first question in the installation dialog is the selection of keyboard. The safest choice is always the American layout. It is hard to persuade people the truth of this when many have become hostile to America because of its invasion of Iraq and its continued support of the Zionists. People have to be taught that anti-American feelings can be counter-productive. The Americans invented personal computers and the internet, and most of these work best with the American keyboard layout. More difficulties are to come. Saifon has offered to give me money to come and help her. There are many networking issues. I want to see if I can get Erica Klarreich to help her. My mobile phone is weak on battery and top up credit. Long range communication is difficult. .... later I have been running comparisions between CFRAC and the Lenstra's Elliptic Curve method. The ECM just worked quite well on a 26 digit number, but CFRAC would also have done it. The Pollard Rho method would also generally work on these problems with an iteration count of between 180000 and a million. The ECM would be excellent if you could select the best elliptic curve at the beginning. The current ad hoc method is to select a random point P=(x,y) and a random value A. The elliptic curve y^2=x^3+Ax+B is chosen by computing B, the constant term as B= (Y*Y)-X*A+X*X mod N, for X,Y corresponding to the start value P. Next the program computes further values in the orbit of P, using the elliptic curve chord tangent process. The values [m]P are chosen carefully. The value m is a product of low prime numbers and prime powers. If the order of the elliptic curve is a smooth number, then a factorisation will be found in time dependent on the largest prime factor of the order of the curve. The method employed at the moment is naive. Modular inversions are done for each addition. The algorithm is quite slow. The advantage is that a lucky choice may be made first time. An algorithm such as CFRAC is deterministic, and if it fails, there is no real fall back position: it is necessary to use the more complicated quadratic sieve. At this stage I see the project should be abandoned. None of the clever methods can tackle thirty digit numbers at all, because the factor base becomes too large, or else smooth numbers become vanishingly uncommon. It is possible to program a blinking bar which shows the search for smooth numbers. A long bar shows that a number is not smooth while a vanishing bar shows a hit, or a smooth number. With large numbers and small factor bases the bar seems to jump backwards and forwards, but it never becomes very short. With larger factor bases the bar may well have smaller lengths, but the movement becomes very sluggish, and for the RSA numbers it takes minutes to do just a few operations. A lucky choice in ECM is the best on offer at the moment. This itself is a problem. Different people should try using different random seeds or different starting points. Anyone who wanted a lot of users to crack RSA would like to have some ways of dealing out personal starting values which were all different. In particular an 'out-of-the-box' installation might set everyone running the same script. BENCHMARKS Mon Oct 25 13:48:16 BST 2004 32-digit numbers. H<-MK_PQ 32 G<-YEC_FACTOR"-b 20000 -v ", H | trivial factorisation G<-YEC_FACTOR"-b 50000 -v ", H | trivial factorisation G<-YEC_FACTOR"-b 14000 -v ", H | factor G found at p ~ 2837. H 24997047150583041152436806085413 G 2537480016458323 F 9851130644753831 F*G 24997047150583041152436806085413 Last night I visited a friend at the other end of town. He has a fast computer with windows ME. I tested out CFRAC on this computer hoping to see if it could get to factor 32 bit numbers. The program took an our to run. In the meantime my friend got Al-Jazeera on the TV but because he wanted to listen to his music system I watched in silence. I was reduced to watching the images of well dressed Arabs enjoying a Ramadan party in some style with lavish tables and hosts of cooks and servants. Underneath scrolled latest headlines in Arabic with easily recognisable stories about the unrolling atrocities in Iraq and the posturings of our great leaders Bush and Karzai. With multi channel TV you can just as easily get bored in Arabic as English. In the panel above the Ramadan partying went ahead at full steam. It's a bit like Hello! and OK! invading the TV; so much like 1950s British TV. While all this was Going on CFRAC produced its stuff. M= 18040037218868180536114750512331 b= 2500 [ 367] 3953626515616998883657277575040 4479648267541 ...... 7814935169012739718808651489397 3223013085115545 .oo... 14780204662769888368422512357774 20627908493 ...... .................. 7808498673022541507208761507427 1838382837713985 .oo... 15054810785997722544892373305899 5710046366072406 oo.... 870084151699329010770992499130 562107293438030 o.o... squares 828495 matrix 372 367 H<-MK_PQ 32 G<-CFRAC"-v -b 2500 ", H H 18040037218868180536114750512331 E<-H%F<-G YDX _1 E 661523668787282869 F 27270433500799 E*F 18040037218868180536114750512331 The program used a factor base of 2500 and collected 372 rows. The correct factorisation emerged during the linear algebra. This is done in two steps. First the program tries to find a relationship amongst the first 367 rows, then deletes the first row and tries again until there are less then 367 rows. If no factor is found all the rows are shuffled, and the the first step is repeated. If no factor is found at the stage it is possible to keep running the linear algebra step on the CFRAC log file until a factor is found. Epic factorisations involve collecting up to ten million rows and then getting the linear algebra step to be run on supercomputers. The advantages of all these linear algebra style methods is that the progress of factorisation can be monitored, and an estimated completion date can be worked out from the number of rows found in a given time. My friend had MathCad on his computer. He wanted to show me some stuff he had been doing on linear transformations of sextic equations but after seeing pages and pages of algebraic formula rolled out in a couple of minutes by MathCad he asked me to validate his method of eliminating variables a,b,c,d,e,f,g,L,M,N,R,S,T by eye. He breaks equations into two, adds brackets and gets MathCad to solve each part in turn, but does not really explain how these things are meant to fit together. I tried to explain that he was just collecting expressions that defined algebraic varieties, and told him he should try and study something about rings and ideals. Any way this friend has great enthusiasm for what he is doing and it is hard to be discouraging. He did manage to get a rather complicated method of solving the quartic out of MathCad. I pointed out that I had seen much more concise methods worked out by beginners on sci.math. ACCESS CONTROL DATA BASE Tue Oct 26 15:24:10 BST 2004 Saifon wants help with her business. She needs to know how to get the computer to help her, rather than learning how to nurse the computer. This is difficult because computers are often so sensitive that it is impossible to get much out of them without going to expensive courses: courses which cost either time, money, humiliation or all of these. Learning from a friend can be the most huniliating. You are made to seem really stupid unless you know how to 'crack' the Microsoft registry and install stolen software or whatever. About 90% of 'friends' are unlikely to have heard of the Free Software Foundation. Parents who teach their children may still be effective. The children are not necessarily so keen to work business style programs. The children have a teacher who is interested in the subject, or the pupil, or both. Saifon may need an access control database. I keep one. When I get a phone call I sometimes look for the originating number in the phone book files, before answering. Most people who call at my flat are somewhere on the computer. There are numbers for the police and the housing authority there. At the moment I am re-examining the Elliptic Curve Factorisation Method (ECM). I have tried to put in the second stage where an algorithm computes points [Q][M]P where P is a point on an elliptic curve, M a highly composite number with factor base S and Q is a prime number not in S. The number M is just the product of all suitable primes and prime powers in S: namely the least common multiple of numbers in S. Even though the algorithm is slow it does not seem to slow down quite so much as CFRAC for large numbers. For the RSA challenge numbers the factor base is in millions, and recognising smooth numbers gets too tedious, but for ECM you just compute a single large smooth number. In the factorisation business it's always easier to construct large products, rather than break them up into factors. SALON SAIFON Thu Oct 28 03:42:03 BST 2004 CFRAC got installed on Saifon's Toshiba Sattelite Pro. I tried to teach Saifon and eventually got her to edit an HTML file with notepad and change the greeting, and then go back to the browser and see the changes. Windows XP is rather good at this because IE5 brings the page source into notepad which was my first lesson. There was an interesting group of people there. Saifon's secretary answered the phone enough for me to get Saifon to run her fingers around the keyboard. That itself is excessively complicated because the machine was originally intended for Spain and the owner's handbook is in Spanish. I spent the first part of the visit reading that, but with great difficulty. Because a laptop has less keys it's always hunt and find. There was an interesting group of people there. Monique, the new girl from Korea seems to be travelling round the world. She visited Thailand and Laos already. Sex may cost \$70 for half an hour but talking about Buddhism and Christianity may set the customer back even more if the time runs and runs ... until such a limit that the customer may end up even marrying the woman. Another girl was interested in the 'server' program which digs up all the images in the cache. The internet had only been connected for a few days, but there was already stuff from porn sites. The thai girl went and sat by the computer to look when I got the refresh rate right. I had forgotten to take the CD-ROM down on the train with me in the morning, but Saifon's secretary showed me how to connect the computer to the Internet. That woman is really smart. She also is already using the machine for e-mail, but I guess that her telephone answering must keep her to busy to be able to take time to teach Saifon. In fact there are plenty of hardware guys amongst the customers. It's quite likely that someone will even be able to put Linux on the machine, although not necessarily well. TETRIS GAME The 'factor base' algorithms remind me of Tetris, the computer game of the 1980s. Random shapes, usually quadrominos, fall down and the player must manipulate these by rotations and translations to fill in lines without leaving gaps. CFRAC and Quadratic sieve also fill in lines, and when you get enough lines you win. The trouble is always the excessive size of the factor base. The quadratic sieve looks tricky to implement, because sieving takes place over very large numbers. The formula x^2-n for x > [#sqrt N] will involve numbers with hundreds of digits, and so they are not easily stored. Quadratic sieve tries to speed up the recognition of smooth numbers. It is hard to see how to do this with small block sizes of just a few hundred thousand. CHINESE REMAINDER THEOREM The 'Chinese Remainder Theorem' is important in the theory of rings and ideals. As it happens any of the RSA numbers can be characterised by just a few dozen remainders modulo P[j] for P[j] in a set of prime numbers. RSA_NUM 123018668453011775513049495838496272077285356959533479219732245215172640 05072636575187452021997864693899564749427740638459251925573263034537315 4 82685079170261221429134616704292143116022212404792747377940806653514195 9 7459856902143413 The above number can be represented modulo the following primes 2 1 3 1 5 3 7 5 ....... intermediate terms omitted 547 401 557 338 563 368 569 7 571 570 Arithmetic can be done on numbers in the range 0 to product of primes by treating long numbers as vectors and doing arithmetic modulo P[j] for each component X[j]. SOCIAL CLEANSING Mon Nov 1 14:59:24 GMT 2004 The middle classes in the UK vote for social cleansing. The estate where I live is to be redesigned and all existing tenants have marching orders. There is no longer a system of waiting lists with points allocated to people based on need: the powers that be have embarked on an expensive and complicated system called 'property shop'. The old system could be brought back with agent programs called 'bid-bots' or bidding robots, but that would be hard to put over to the idiots that commission I.T. systems in this country. Some major pharmaceutical companies are now moving their expensive clinical trial work to Asia. They will get better statistics in India, home of Ramanujan. Some of the tenants have lived here most of their lives. They are involuntary immigrants to a future which seemed a nightmare back in the 1960s. Wholesale, surveillance, and police criminal record checks for even the most menial jobs, but there is still crime and inequality. One of the more sensible people at tenants meetings is a previous construction worker who has been site foreman on several large projects. He was able to explain the tree planted slopes nearby. Many tenants want these trees removed to give a clear sight zone for security cameras but the engineer explained that the trees serve as soil stabilisation. Of course none of the architect plans seem to take into account global warming and the possibility of increased frequency of really heavy rain and mudlslides. Sheffield has many hillsides which could start moving in real storms. Returning from the tenant meeting I went up the stairs. The lifts were unattractive because of the racist graffiti that had been daubed there that morning. About four levels up I came across a marmalade coloured cat that I had not seen before. The cat was too shy to be approached, but it did not run away when I sat down on the corner above and watched it. The American elections are tomorrow. Saturation media coverage makes them boring. Even Osama missed his chance. A definite endorsement of Ralph Nader would at least have had its tragi-comic value. IMHO Ralph Nader deserves full marks for perseverence. Noam Chomsky and Soros urge people to vote for Kerry, but Kerry is from practically the same machine as Bush. 'We Can Build You' [6] has much to say about the American system. For the last few days I have been running ECM (Elliptic Curve Method) on test numbers with two prime factors. The numbers have only 24 or 25 digits but at least I can test out strategies. It seems better to try several elliptic curves with small factor bases up to about 4000 or so, then 4000-40000 in phase two. Quite a lot of the numbers have given factorisations very early on with a lucky choice of the curve. ECM has a good recommendation: try and win \$200000 but if you fail you might win a million dollars by shedding light on the Birch Swinnerton-Dyer conjecture ! Eureka! While writing this entry I tried a 30 digit number. It just stopped shortly into phase two. H<-MK_PQ 30 G<-YEC_FACTOR"-b 7500 -v -cnt 5 ", H F<-H%E<-1dG H 893661884227811292215883301721 F 99552159916210609 E 8976820643369 F*E 893661884227811292215883301721 REFERENCES [1] Marcus du Sautoy The Music of the Primes Fourth Estate / Harper Collins ISBN 1-84115-579-9 [2] RFC2313 RSA Laboratories [3] jobzap.txt [4] rsq.df Source code of CFRAC, Random Squares. [5] D.E.Knuth The Art of Computer Programming. (c 1969 --) multiple volumes. Addison Wesley [6] See Frequently asked questions. Freedevelopers.net [7] We Can Build You. Philip K. Dick.