Computers have it in for me
Mar. 2nd, 2007 11:09 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I have on my desk the replacement for the computer I bought at the start of the year which exploded three hours after purchase.
I've had it for not quite a week. It persistently crashes, apparently sometimes corrupting parts of the hard disc, when I give it certain fiddly disc-intensive calculations to do. Sometimes it crashes under other circumstances. Is Ubuntu 6.10 known to be this irredeemably unreliable on contemporary Intel hardware, or have I just received my second lemon in an order of two?
It's running memtest86 overnight, but no faults have shown up yet; I suspect this is a disc-controller rather than a memory issue, and no faults will show up. The last memory issue I had -- a module which had an unshakeable faith that every 1024th bit in the memory it presented had to be reported as zero whatever had been written to it -- showed up immediately in memtest86.
Is there a disc equivalent of memtest86? I'm prepared to take backups of everything - there's not much novel, I've only had the machine a week - and sacrifice the disc contents should the test need to be destructive, though I'd prefer something that sat in userspace and confined its merciless thrashing to the bits of the disc on which my data isn't.
I've had it for not quite a week. It persistently crashes, apparently sometimes corrupting parts of the hard disc, when I give it certain fiddly disc-intensive calculations to do. Sometimes it crashes under other circumstances. Is Ubuntu 6.10 known to be this irredeemably unreliable on contemporary Intel hardware, or have I just received my second lemon in an order of two?
It's running memtest86 overnight, but no faults have shown up yet; I suspect this is a disc-controller rather than a memory issue, and no faults will show up. The last memory issue I had -- a module which had an unshakeable faith that every 1024th bit in the memory it presented had to be reported as zero whatever had been written to it -- showed up immediately in memtest86.
Is there a disc equivalent of memtest86? I'm prepared to take backups of everything - there's not much novel, I've only had the machine a week - and sacrifice the disc contents should the test need to be destructive, though I'd prefer something that sat in userspace and confined its merciless thrashing to the bits of the disc on which my data isn't.
no subject
Date: 2007-03-02 11:57 pm (UTC)no subject
Date: 2007-03-03 12:04 am (UTC)no subject
Date: 2007-03-03 12:31 am (UTC)On Debian, I've used hdparm, hddtemp, and smartmontools to poke at disks and see if they're complaining about problems.
There appears to be a tool with the somewhat obvious name of 'testdisk', which may also do what you want.
Baldy
no subject
Date: 2007-03-03 12:33 am (UTC)Baldy
no subject
Date: 2007-03-03 08:26 am (UTC)no subject
Date: 2007-03-03 12:47 pm (UTC)no subject
Date: 2007-03-03 12:50 pm (UTC)no subject
Date: 2007-03-05 12:03 am (UTC)I'm using software RAID, a RAID1 made from partitions on two ATA discs and itself formatted as ext3; the OS is a fresh install of Ubuntu 6.10_x86-64.
I've managed to get some information about the failure mode ... under moderate disc load (four parallel cp operations of a few gigabytes from one part of the RAID1 partition to another) I got a syslog message: segfault at
ext3_ordered_writepage+243
called from find_busiest_group+404
called from do_writepages+41
called from del_timer_sync+12
called from find_busiest_group+404
Repeating the actions that caused that failure did not immediately produce a second; repeating them again while running nine cat /dev/zero > /dev/null & in the background caused the cursor-corruption and hang but no syslog entry; repeating again causes hang without cursor corruption, repeating from text consoles seems to work (four copies of the torture-test in parallel run to completion) but is obviously not acceptable.
I think this, particularly the way it seems to show up more in X, begins to look more like a kernel-interacting-with-chipset bug than like a hardware problem; that various people in the thread under http://www.aceshardware.com/forums/read_post.jsp?id=120076902&forumid=1 report instability may also be such a sign.
So, the Edgy kernel is dodgy. Where do I go from here?
no subject
Date: 2007-03-05 02:16 am (UTC)