fivemack: (Default)
I have been know to factorise large numbers from time to time. I have a fairly ludicrous computer with 48 processors. Processors 6n .. 6n+5 share a memory bank.

When I do 'mpirun -n 24 [job]', I find that the speed of the job changes substantially, and often for the worse, every couple of hours. I suspect the scheduler is shuffling the jobs around the processors; even when I use a taskset to restrict the 24 jobs to 24 processors, it's shuffling them within the taskset. Since memory is allocated in the bank associated to the processor that the job doing the malloc is running on at that moment, and thereafter never moved, this means I end up with jobs running with all their memory accesses to a different bank; this is slow.

My current best-bet is:

taskset -c 0-2,6-8,12-14,18-20,24-26,30-32,36-38,42-44 mpirun -n 24 msieve ...

Allow the job to start (in particular, to allocate the enormous arrays it needs)

for u in $(for v in $(pidof msieve); do echo $v; done | sort -n); do grep -H "heap" /proc/$u/numa_maps; done

to determine which bank the memory has been allocated on, and then manually write a set of taskset commands to get each job onto a core associated with that memory bank. At least the one time I've tried it, precisely three jobs ended up allocated to each memory bank, though the ordering was 723154106374602651025347.

This seems to work reasonably well, but I feel there must be a less crazy way to do it! Any advice?

PS: it turns out that the right answer is to use options in mpirun:

taskset -c 0-47:6,1-47:6,2-47:6 mpirun -n 24 --bind-to-core --report-bindings numactl -l ~/msieve-mpi/msieve/trunk/msieve -v -nc2 3,8

where the taskset clause restricts the job to running on a subset of processors

the mpirun options bind each job to a single processor

the numactl option forces the job to allocate its memory on the processor it's bound to
fivemack: (Default)
I have a couple of computers running ubuntu 9.04, and two others running 8.04, all attached to a gigabit switch attached to an ethernet-to-wifi bridge bridged to a little Buffalo ADSL-router-box connected to the internet.

The little Buffalo ADSL-router-box has a DHCP server, which is set to hand out particular fixed IP addresses to the MAC addresses of my computers; I have /etc/hosts files on all the machines saying things like '172.26.200.43 cow'. For the 8.04 machines, this works fine.

For the 9.04 machines, the address assignment is ignored entirely. However, something (which I think is zeroconf) talks to the ADSL-router-box and causes it to set things up in its DNS, meaning that I can say 'ssh node2@cow.local' and get to the machine called cow, whose IP address is however not 172.26.200.43 (and indeed changes every 24 hours).

How do I go about turning this off, so that the computers keep the addresses that I have assigned for them in the DHCP server on the router-box rather than daily going through some complicated protocol to negotiate a wrong address that keeps changing ?

Extracts from /var/log/syslog that might be relevant are at http://pastebin.com/T0rLdrc1
fivemack: (Default)
It turns out, and fortunately this cost me neither sleep nor hair, but only because I have taken some care to acquire an attitude of relative calmness and to fiddle with computers only when there are no urgent deadlines requiring those computers in the near future, that it is possible for a motherboard to be incompatible with a power supply.

Specifically, on some Gigabyte motherboards, among them the MA78GPM-DS2H which I have, the USB ports don't work if you use an Enermax Pro 82+ power supply of the kind I just bought. Which is inconvenient, since on that particular system both the keyboard and the disc with the OS on were attached by USB.

The wattage of the PSU is not the issue here, mine was the 625W version, which would be enough to melt a USB stick let alone power it. It's not a matter of damaging the motherboard; when I put the old PSU back in it started working again.

I now have everything set up sensibly. It took me all morning. Particularly annoying was the moment that I discovered that the video card I was trying to install was half an inch too long to fit in the case I wanted to install it in, and so I needed to swap two motherboards round (that is, dismantle to total bareness two cases filled with fiddly electronics held in by multitudes of small screws and connected by python-like bands of cables, and reassemble the other way round). At least it's not covered in grease, you don't need a hammer to loosen it, little of it is particularly sharp, and there is no risk that it will disintegrate while I'm relying on it to keep me from driving into a lorry at 70mph: I am not one of nature's garage mechanics.

A discovery that may be useful to other people: the little hexagonal stand-offs for attaching motherboards to cases are not standard either in diameter or in top or bottom thread pitch between case manufacturers. So if you have mixed the bags that came with two cases, you have to try at least two screws in each stand-off to see which fit, and if you need to fit extra stand-offs you will have to try lots to find one which fits. The stand-off most inaccessible on the motherboard will, of course, be the one in which you have to try most screws.
fivemack: (Default)
Intel has announced the instruction set for its new vector-supercomputer-disguised-as-a-graphics-card 'Larrabee'.

http://software.intel.com/en-us/articles/prototype-primitives-guide/ has a C++ implementation using the data types and intrinsic names which the real thing will use.

It has a full set of the instructions you would expect, including count-set-bits and find-first-set-bit; it has vector gather and scatter (finally!), it has the normal-for-Intel irritating omissions (add-with-carry for 32-bit numbers only?), and it has one or two really quite surprising instructions:


BITINTERLEAVE21_PI - 2:1 Bit-Interleave Int32 Vectors

Performs an element-by-element bitwise interleave, using a 2:1 pattern, between int32 vector v2 and int32 vector v3. The low 21 bits from elements in v2 are interleaved with the low 11 bits from elements in v3 to form a vector of 32-bit values. Bits alternate 2:1, so that source elements A and B combine bitwise this way (high to low):
A20 B10 A19 A18 B9 A17 A16 B8 … A5 A4 B2 A3 A2 B1 A1 A0 B0



I will buy a chocolate pudding at the Carlton next Thursday for the person to give the least ludicrously contrived example in which this instruction might be useful. There is also a BITINTERLEAVE11_PI which takes alternate bits from the two source elements.
fivemack: (Default)
I recently bought some samples of rare-earth elements from elementsales.com - gadolinium, terbium and dysprosium - to play with their magnetic properties. They're supplied as coins inside plastic discs, since they're reasonably reactive.

The gadolinium behaves roughly as I was expecting it to; it's quite strongly attracted to a magnet when cold, and less so when hot. I thought the Curie point was a sharp phase transition and the material would be non-magnetic above 19C, but the material sticks to a magnet even if I've freshly taken it out of hot water. I've been a bit wary since the Curie point of NdFeB magnets is only about 80C; I should get hold of a more-robust magnet. eBay has a very limited range of SmCo2 magnets (most hits for samarium-cobalt are guitar pickups); possibly I just want a large iron bar magnet, but I'm not quite sure where to buy those in the real world.

The terbium and dysprosium, however, are also attracted to the magnet (the Dy less so than the Tb) at room temperature. It's a fairly fearsome magnet, so I suppose that the Tb and Dy have some traces of Gd left in them and that's what's being picked up; in which case I should try boiling them and seeing how the magnetism goes away. I need to think more about how to measure the forces here; I can't think of a setup with magnet, element, spring-balance and bits of string where I can just read off the force, and a model where I pull on a spring balance until the element comes free of the magnet seems impossible to get good readings from.

I imagine a note to the element supplier saying that they are supplying inferior gadolinium-laced terbium would not be useful; separating adjacent rare earth elements is proverbially hard.

Any advice on better magnets, better terbium, or better experimental setup?
fivemack: (Default)
I have just emerged from a five-hour optimisation trance, fuelled by sushi and plum wine.

Here is the code; it looks for numbers which can be written as the sum of three sixth powers in two different ways. It's multi-threaded (by constants in the code assuming you have four cores spare) and uses a cache-friendly blocked linked-list structure; it slices up the problem into chunks by sum-modulo-P and then into buckets by sum-modulo-Q, then sorts the bucket contents. It uses moderately prodigious amounts of memory - about 20N^2 bytes. It takes 1m15s wall-time on my machine (2.4GHz quad-core) to run 'sumsix_t4 2003', 10m35s wall-time for 'sumsix_t4 4007'. I'm surprised that the output from 'time' indicates that it spends non-trivial time in the OS: how? The only bits that look like OS calls are print statements and memory allocation, and it only does O(N) of those.

real 10m35.586s
user 31m20.462s
sys 5m20.408s

I'm sure it could be significantly faster - it's trivially parallel and I'm only getting x3 speedup on four CPUs - but I am not quite sure how. I've looked at profiles, I suspect I may have been foiled by out-of-order execution, where quick-op ( result of slow-op ) has to wait for slow-op to complete and so appears as a hotspot when the real hotspot is elsewhere. Hard-wiring the values of 'N' and 'bucket' so that the compiler can replace the modulo operations with multiplies by magic constants doesn't make a difference.

If there are people out there who like this sort of challenge, I'd like some input.
fivemack: (Default)
Under appropriate circumstances, it is possible to feel stress in your shoulders.

I am wondering at what point it makes sense to curse very loudly, purchase a generic Windows-running PC from WoC, and send this perfectly respectable computer, with six years of service behind it, to Cambridge Computer Recycling where it will be examined, declared too weird for resale (it has Rambus memory and a 478-pin-to-423-pin adaptor hosting a 2.3GHz Pentium 4 - you may not remember that they made 2.3GHz Pentium 4s; AGP graphics, PATA hard discs, and a floppy drive with a label 'tested OK 23/9/1997' on it), and thrown away as good for nothing.

You would naively suppose that installing Windows XP onto a computer which has in the past run Windows XP, and which until you started the installation was happily running SuSE 10.1, would not be a hard job. I've spent six hours at it so far, I've removed, twiddled, checked jumpers and reinserted all the hardware on the system, but whatever I do, and whichever of the hard discs I have around I use, the best case is that Windows Setup spends half an hour formatting the disc before declaring that the disc cannot be formatted and stopping. The normal case is that it fails to recognise the existence of either CD-ROM drive. No smoke has come out, yet, which is better than the computer I bought at the start of this year.

I suspect it's something to do with masters and slaves on IDE channels. Or, I suppose, it's conceivable that both hard discs are broken. I sit. I sip cocoa, I plan an early night. Unfortunately, my current reading is the Chronicles of Thomas Covenant, Unbeliever, which is not as purely calming as something with more cuddly pandas and fewer self-destructive depressed lepers would be.

Update: As always, I spoke too soon; having run out of things to try, the final thing I tried actually worked. The machine runs Windows XP, connects to the wireless network (note: ensure you have typed in MAC addresses correctly when adding to the allow-from-only-these list), plays DVDs (thanks to VLC), and is in fact a reasonable form of the birthday present that I'd tried to give to my youngest brother back in July. My shoulders feel less spiky already, and it's only, umm, four hours since I started this time, and I only wasted a couple of hours back in August.

Downloading security update KB873339 (1 of 86) - and it was a WinXP SP2 install disc!
fivemack: (Default)
Issue: What do I need to do to get a tape autoloader to work on an RHEL 3 machine

Answer: You need to install 'mtx' and 'sg3_utils' by doing 'up2date mtx; up2date sg3_utils', then reboot.

Issue: My computer running RHEL 3 (Red Hat Enterprise Linux 3) does not detect the media changer on my Quantum Superloader 3 tape library. Or it does not detect a WD MyBook USB hard drive, with /proc/scsi/scsi listing only the 'Enclosure' part.

Answer: http://spiralbound.net/2006/10/16/making-rhel-3-see-multiple-luns - you need to add a line to /etc/modules and rebuild your initrd to tell the kernel to look for multiple SCSI devices exposed by a single physical machine. Since USB, for added irritation, is part of the SCSI subsystem, this is also needed for USB peripherals that expose several USB devices.

By the way, the 'Enclosure' for a WD MyBook USB hard drive is the glowing blue button on the front.

Issue: Having done this, I get 'input/output error' messages from Amanda and 'st0: incorrect block size' messages appear in dmesg

Answer: You need to issue a 'mt -f /dev/nst0 setblk 0' command, possibly every time that you load a tape

Issue: mtx doesn't detect the mail-slot on my Quantum Superloader 3 tape library

Answer: No. It doesn't. Use the controls on the front to load and eject tapes from the library. Sorry.


It has taken two of us most of the day so far to answer these questions; Google gets inordinate numbers of references to the questions and very few to the answers. I *hope* that Google will now index this page and my successors will be able to get all of this at once.

BTW, do not mail john@globalphasing.net because we are using it as a spamtrap
fivemack: (Default)
It seems to be proximity to me, rather than ownership by me, that breaks hard drives; the external drive at work onto which I had laboriously copied 41 DVDs of crystallography images gave up the ghost this week. That's the third this year. I suppose I own about nine drives and they last about five years so I should expect two deaths a year, but I have friends ([livejournal.com profile] damerell, [livejournal.com profile] nojay) with as many drives who seem to curse their failure less often.

Amazingly and unprecedentedly, this one was within warranty, and Seagate should send a replacement before the decade is out.
fivemack: (Default)
This week, I am mostly learning Fortran 90.

It's a language which nicely matches the sort of code I like to write; arrays as really first-class language elements are good, and the DWIM read() and write() statements avoid some of the more tiresome boilerplate that typed languages attach to I/O.

I assume there are Fortranophones among my readers; is there a nicer way to write


atomcounts(fooi(1),fooi(2),fooi(3)) = &
1+atomcounts(fooi(1),fooi(2),fooi(3))
?

The obvious
atomcounts(fooi) = atomcounts(fooi)+1
translates as
atomcounts(fooi(1)) = 1+atomcounts(fooi(1))
atomcounts(fooi(2)) = 1+atomcounts(fooi(2))
atomcounts(fooi(3)) = 1+atomcounts(fooi(3))


which is a rank error since atomcounts is a 3D array; also, is there an increment-in-place procedure that I'm missing?
fivemack: (Default)
We've just got a new machine at work on which I've been asked to install OpenSuSE 10.2 64-bit; this felt entirely straightforward.

Unfortunately, the default install of OpenSuSE doesn't include gcc. When I try installing gcc using YaST2, I get something which looks superficially like gcc, but which says 'gcc: error trying to exec 'cc1': execvp: No such file or directory' whenever I try to compile anything with it.

So, where's cc1 coming from? Deciding now would be a good time to use up some of the EU Assorted Symbol Mountain, I type

for i in /media/SU1020.001/suse/x86_64/*.rpm; do rpm -qpl $i | sed -e "s/^/${i//\//_}/g" | grep cc1; done

which tells me that /usr/lib64/gcc/x86_64-suse-linux/4.1.2/cc1 is provided by cpp41-4.1.2_20061115-5.x86_64.rpm

And indeed /usr/lib64/gcc/x86_64-suse-linux/4.1.2/cc1 exists on the machine. So, why isn't /usr/bin/gcc-4.1 finding it?

Normally strace comes to the rescue, but 'strace /usr/bin/gcc-4.1 -c foo.c' outputs many lines of the form

stat64(0x806b628, 0xff885abc) = -1 ENOENT (No such file or directory)

which are totally useless because strace is failing to dereference the pointer to the filename passed to stat64.
fivemack: (spiky)
What I want: a subroutine footle such that, if you call footle(a,b) twice with the same a,b, it does nothing the second time

What I did:
use strict;
sub footle
{
  my ($a,$b,%done) = @_;
  my $concat = $a.$b;
  if ($done{$concat} == 0)
  {
    print "footling $a $b";
    $done{$concat} = 1;
  }
}

my %isdone = ();

footle("bootle","bumtrinket",%isdone);
footle("bootle","bumtrinket",%isdone);

But this doesn't work because parameters are passed by value.

But if I call as footle("bootle","bumtrinket",\%isdone), which passes isdone by reference, it still does the footling twice.

Even if I put $_[2]=%done before the end of the subroutine, it still does the footling twice.

And if I put print join "*",(keys %done); at the start of the subroutine, it says HASH(0x8188110)footling bootle bumtrinket

So how do I really pass the parameter by reference, as if I'd said void footle(int a, int b, set<string>& done) in C++?
fivemack: (Default)
My .emacs file contains the line

(set-default-font "7x13")

and indeed whenever I load a file into emacs, it comes up in 7x13.

But if I do C-x 5 2 to get another emacs window, the file in that window appears in a much larger and uglier font with inelegant serifs. How do I really set the default font?

[note: I don't run emacs-client, I edit files with 'emacs foo' on the command-line, so often I have lots of separate emacs processes; also, by 'window' I mean a window-system window rather than whatever emacs's internal jargon 'window' means]
fivemack: (Default)
I bought an external hard drive on 8 November 2004.

Trying to do a backup to it this evening, I find that it has stopped working.

Looking at the place I got it from, I discover that its warranty was for two years.

Hard discs cost so much more than hamsters that you would hope they would live longer, but it is not to be.
fivemack: (Default)
Is there some series of command-line options to 'tar' which lets me create a tar file which contains the file called (with respect to the current directory) X/Y/Z/foo.baz but in such a way that, when my end-user unpacks the tar file, it comes out as R/S/bar.quux ?

It seems a natural thing for anyone doing packaging to want to do; the GNU info page for tar appears to be a dreadful combination of inadequate tutorial and inadequate reference manual, and I've been unable to figure out what to do merely by reading it.
fivemack: (Default)
Is there any way for a Java application to display a dialogue box at the top of the window stack, rather than at the top of the stack of windows managed by the Java VM?

When I'm testing the app, quite often I open an editor, which appears large and on top of the app window, make some note about the behaviour I've observed, and close the app; it displays a 'do you really want to close' dialogue, which appears behind the editor window, and the app then appears to have crashed because it's waiting for me to respond to the dialogue I can't see.
fivemack: (Default)
At work, I've the good fortune of having a two-CPU workstation on my desk.

Unfortunately, whilst the two CPUs are within a centimetre of one another on the same piece of silicon, they appear to maintain independent clocks running with a noticeable offset; I can't tell if the speeds are also different.

In any case,

a = clock()
multi_threaded_operation()
b = clock()


can leave b reading out as several seconds before a, if the main thread was initially on the core with the later clock and got rescheduled onto the other one after the operation. This is not helpful for seeing which things are actually faster than which others.

Is there a standard C library routine, or at least something in <sys/*.h>, guaranteed to read a clock of a kind such that I can be reasonably confident that the computer's got only one?
fivemack: (Default)
Is there any shell which maintains an at-all-sensible command history when you're working with several terminals each with half a dozen sessions in tabs? Intercalating the history from multiple sessions would probably be ideal for my current working style; appending the history from each session as a lump when the session closes would also be fine; but at the moment tcsh seems to maintain history for at most one session, randomly-selected, and this makes 'history' less than useful if I actually want to work out what I've been doing.
fivemack: (Default)
There's a transaction-processing benchmark, results listed at www.tpc.org, for which major computer manufacturers are prepared to spend millions of dollars of engineering time and use tens of millions of dollars worth of hardware.

The current top entries are offering rates of a couple of million transactions per minute, which translates to between one and two trillion transactions a year since there are almost exactly half a million minutes in a year.

I've just looked at ebay's financial statements, which indicate that 2.5 billion items are sold through ebay annually; if we assume that each bid is a transaction and that each item gets twenty bids, that's a hundred thousand transactions a minute.

Tesco's sales are on the order of £40 billion a year; even if we assume that each item on a bill is a transaction, the average Tesco item cost more than 50p, so that's less than 80 billion transactions a year, 160,000 a minute. Wal-Mart has about five times the sales of Tesco, which brings you to half a trillion; a million a minute.

What exactly is the point to IBM or to HP of demonstrating a single machine capable of handling all the transactions at every supermarket in the European Union? This seems the kind of machine of which they can sell one.
fivemack: (Default)
http://www.thegoodscentscompany.com/rawmatex.html


Pick your peculiar ester, and it not only displays a little rotating molecule of it, but tells you what it smells like, whether it's poisonous, its density in pounds per US gallon, its refractive index, and how prone it is to catch fire!


Since posting every entertaining link I find would make this livejournal dense and impenetrable, as well as making me out to be rather more prone to enthusiasm than a hyperactive ferret in a sequin factory, I'm mostly accumulating them at http://del.icio.us/fivemack

July 2017

S M T W T F S
      1
2345678
9101112131415
161718 19202122
23242526272829
3031     

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 21st, 2017 03:12 am
Powered by Dreamwidth Studios