![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I have a directory with 244 files with names like m12331246123468911531238951802368109467.mlog, which I want to rename to names like C038.123312.mlog
time for u in m*mlog; do B=$(echo $u | cut -dm -f2 | cut -d. -f1); echo $u C${#B}.$(echo $B | cut -c1-6).mlog; done
takes 17 seconds
time for u in m*mlog; do B=$(echo $u | cut -dm -f2 | cut -d. -f1); echo $u C${#B}.${B:0:6}.mlog; done
takes eight seconds
time for u in m*mlog; do B=${u:1}; B=${B%.mlog}; echo $u C${#B}.${B:0:6}.mlog; done
takes 0.2 seconds.
Of course, when I replace 'echo' with 'mv' it still takes fourteen seconds, but I am not that shocked that mv over NFS might be slow.
Which suggests that doing $() to start a new shell is taking something like a hundredth of a second on a one-year-old PC. I didn't know that. On the other hand, if I start writing code this dense in unclear bashisms, my colleagues at work will disembowel me with spoons.
PS: if I stop running a CPU-intensive program on each of my eight cores, starting new processes gets about fifteen times faster. I can understand if it got twice as fast, but I really don't understand fifteen.
time for u in m*mlog; do B=$(echo $u | cut -dm -f2 | cut -d. -f1); echo $u C${#B}.$(echo $B | cut -c1-6).mlog; done
takes 17 seconds
time for u in m*mlog; do B=$(echo $u | cut -dm -f2 | cut -d. -f1); echo $u C${#B}.${B:0:6}.mlog; done
takes eight seconds
time for u in m*mlog; do B=${u:1}; B=${B%.mlog}; echo $u C${#B}.${B:0:6}.mlog; done
takes 0.2 seconds.
Of course, when I replace 'echo' with 'mv' it still takes fourteen seconds, but I am not that shocked that mv over NFS might be slow.
Which suggests that doing $() to start a new shell is taking something like a hundredth of a second on a one-year-old PC. I didn't know that. On the other hand, if I start writing code this dense in unclear bashisms, my colleagues at work will disembowel me with spoons.
PS: if I stop running a CPU-intensive program on each of my eight cores, starting new processes gets about fifteen times faster. I can understand if it got twice as fast, but I really don't understand fifteen.
no subject
Date: 2010-07-20 08:39 pm (UTC)no subject
Date: 2010-07-21 08:50 am (UTC)tcsh would be well and truly stupid enough to reparse all of .cshrc etc, but I don't fell like testing it (why oh why aren't astronomers brave enough to move on from 30 year old evil history?). It definitely does parse all of that crap when you have a #!/bin/csh script - fortunately bash doesn't do that unless you also supply -i.
No, you probably replaced the programs because most OSes have been traditionally very slow at fork() (and that goes for non-shell programs too). Slowaris is called Slowarsis for a reason :)
Linux has always had lower overheads at fork. The other OSes still did copy-on-write and everything, but just did it... badly.
The slowness of fork in this case when the CPUs are busy is surprising - possibly just a scheduler issue - the forking process is held too long on the wait queue and is starved of the resources needed to fork?
no subject
Date: 2010-07-21 10:28 am (UTC)no subject
Date: 2010-07-20 08:48 pm (UTC)The 8 second version has (similarly) 3 shell invocations.
The .2 second version has no shell invocations.
So that all looks about right for shell invocations being the issue, yes.
However, the first version (with somewhat different filenames obviously) executed on a more-than-5-year-old Linux box in 10 seconds -- for 1000 names, about 4 times as many as you used. The .2 second version took .07 seconds, again on 1000 files. I guess a factor of nearly 10 between two random old PCs is not out of bounds; the more important thing is the ratio between the tests being fairly consistent. (This was a decent server when new, which might about balance its being older.)
This may point at the NFS disk being the issue since my test was on local disk. I'm in the midst of completely hacking apart my little bit of NFS use so I guess I can't test that right now.
I dunno that the bashisms are less clear than using cut; in any case man bash or man cut will elucidate. It does mean the scripts become non-portable to systems without bash; I confess I've given up caring about those, myself.
Well, I understand 8 anyway. Your new bash has to take it's place in the round-robin with the 8 cpu-intensive programs, right?
no subject
Date: 2010-07-20 09:06 pm (UTC)no subject
Date: 2010-07-20 09:23 pm (UTC)So yeah, there's something to explain there.
You're not short of memory for what's running, are you?
no subject
Date: 2010-07-21 10:25 am (UTC)no subject
Date: 2010-07-20 09:38 pm (UTC)(If it's something I'll use six months from now, I'll pay the setup time price and do it in Python so I can read it six months from now. For a one-off one-liner, Perl is fine.)
no subject
Date: 2010-07-21 10:25 am (UTC)no subject
Date: 2010-07-20 09:39 pm (UTC)no subject
Date: 2010-07-21 07:17 am (UTC)no subject
Date: 2010-07-21 07:06 pm (UTC)What I said on IRC: fork+exec is incredibly expensive compared to a bit of string handling.
if I start writing code this dense in unclear bashisms, my colleagues at work will disembowel me with spoons
That assumes they’re fluent with cut. Personally I never bothered to learn it because all the alternatives were quicker and easier.