tag:blogger.com,1999:blog-20067416.post7227675339922347049..comments2021-05-04T18:49:35.162-04:00Comments on Recursivity: An Array Initialization TrickUnknownnoreply@blogger.comBlogger13125tag:blogger.com,1999:blog-20067416.post-49102642609561209162010-10-13T01:22:36.642-04:002010-10-13T01:22:36.642-04:00Oh, and D. Swart: I agree. For one-off personal p...Oh, and D. Swart: I agree. For one-off personal projects (or even professional projects which won't be run many times), programmer time dominates run-time. In that case, the best sort algorithm is almost certainly the one that you don't have to write: either the one in the standard library of your programming language or, as I did about a month ago, piping data through /usr/bin/sort.Pseudonymhttps://www.blogger.com/profile/04272326070593532463noreply@blogger.comtag:blogger.com,1999:blog-20067416.post-3065427315726781232010-10-13T01:16:50.232-04:002010-10-13T01:16:50.232-04:00If you're going to sort only three elements, t...If you're going to sort only three elements, then bubble sort performs the same amount of work as insertion sort or, if you don't implement it carefully, performs one extra comparison.<br /><br />As for the second example, if the new element that you need to add is at the <i>end</i> (as opposed to the <i>start</i>) then the bubble sort that's in your standard algorithms textbook does O(n^2) work in general. If you change it to "bubble" from right to left, it does almost exactly the same amount of work as an insertion sort that's been modified similarly. An unmodified insertion sort does roughly twice the work as the modified one, but it's still linear as opposed to quadratic.<br /><br />Either way, bubble sort is never better than insertion sort, and is almost always worse. Insertion sort is also easier to write.<br /><br />Note that it may be possible to come up with an unusual problem where a <i>modified</i> bubble sort outperforms many other algorithms. For "modified bubble sort", read "problem-specific algorithm".Pseudonymhttps://www.blogger.com/profile/04272326070593532463noreply@blogger.comtag:blogger.com,1999:blog-20067416.post-32438845907436786382010-10-08T23:11:32.092-04:002010-10-08T23:11:32.092-04:00>> Although we have to allocate the memory f...>> Although we have to allocate the memory for B once at the start<<<br /><br />Or you could use a hash map and avoid the initialization cost AND the allocation overhead.<br /><br />>>This algorithm really is useless. On every workload, on every CPU, in every situation that is ever going to come up, insertion sort is always a superior choice.<<<br /><br />Pseudonym, please. There is a time for every purpose under the sun.<br /><br />Suppose the arrays you're going to sort are never longer than three elements?<br /><br />Suppose the arrays you're going to sort are already sorted except that one new element may have been added at the end?<br /><br />In both case, it'd be tough to beat a bubble sort.F. Andy Seidlhttp://faseidl.comnoreply@blogger.comtag:blogger.com,1999:blog-20067416.post-54816632525393320642010-10-01T11:23:12.058-04:002010-10-01T11:23:12.058-04:00Pseudonym. You with your 'always's 'e...Pseudonym. You with your 'always's 'every's and 'never's!<br /><br />For my own one-off personal projects, the computer's time is often less valuable than my own.<br /><br />In these cases, ease-of-implementation trumps time, space, and scalability. <br /><br />So I'm not ashamed to say I've whipped off bubble-sort caliber algorithms when no prepackaged implementations were at hand (i.e., selection sort).D. Swartnoreply@blogger.comtag:blogger.com,1999:blog-20067416.post-82718820726757147922010-09-23T21:56:58.706-04:002010-09-23T21:56:58.706-04:00Thanks, Gareth, for saying pretty much what I was ...Thanks, Gareth, for saying pretty much what I was going to say. The key thing is that for small-to-moderate amounts of data, constant factors dominate in heap sort. Constant factors matter just as much as complexity.<br /><br />Incidentally, here's something that a theorist should appreciate: "Memory access is the only thing that really costs" is a good approximation a lot of the time, especially on microcontrollers. On desktop/server CPUs, however, one thing that often dominates is branch misprediction.<br /><br />Algorithmic information theory 201 requires that comparison-based sorting must perform O(n log n) data comparisons at some point. What often isn't appreciated is that the result of these comparisons are essentially impossible to predict in advance. This should be clear why: the result of those comparisons is precisely the information that you're trying to uncover.<br /><br />By contrast, loop branches and exception branches are usually very easy to predict: Most of the time, loops are "taken" and exceptions are not "taken". Both static and dynamic branch prediction usually do an extremely good job on these cases.<br /><br />So when you have a job for which a comparison-based sort is a bottleneck, and memory access times aren't an issue, then more often than not, most time is spent undoing the conditional branch resulting from the data comparison.<br /><br />There are always a bunch of caveats here, of course. Many CPUs can recover from speculative execution of a single conditional branch more cheaply than multiple branches. So if your comparison is actually on multiple keys where the primary key is "equal" a significant amount of the time, the effect is often more noticeable. Some CPUs have a conditional move or conditional skip instruction, which can help if it applies. And, of course, some variant of radix sorting is always an option.<br /><br />One of the more surreal moments I've spent in my HPC career is rewriting whole modules just to avoid a single conditional branch. Needless to say, this is something you only do after much profiling, thinking, back-of-the-envelope reasoning and just a little swearing.Pseudonymhttps://www.blogger.com/profile/04272326070593532463noreply@blogger.comtag:blogger.com,1999:blog-20067416.post-11705141257906025732010-09-23T10:38:37.353-04:002010-09-23T10:38:37.353-04:001, 4, 13, 40, ...: start at 1, do n -> 3n+1 unt...1, 4, 13, 40, ...: start at 1, do n -> 3n+1 until you get too big. Equivalently, (3^n-1)/2.<br /><br />I think folklore says you can actually do better than that, but clearly it's good enough as it is to beat heapsort for moderate-sized arrays. At least if you count mems; I haven't tried making streamlined implementations in C and actually timing them, which would probably be more appropriate for measuring smallish-array performance. (Because for small arrays the "memory access is the only thing that really costs" approximation, which is one of the main justifications for counting mems, isn't as true as it is for larger arrays. For that matter, cache effects mean that even for large arrays mems are at best a useful approximation. But I digress.)Gareth McCaughanhttps://www.blogger.com/profile/05377158305586280009noreply@blogger.comtag:blogger.com,1999:blog-20067416.post-4376983171137476182010-09-23T08:19:08.803-04:002010-09-23T08:19:08.803-04:00Gareth:
What increments did you use in Shell sort...Gareth:<br /><br />What increments did you use in Shell sort?Jeffrey Shallithttps://www.blogger.com/profile/12763971505497961430noreply@blogger.comtag:blogger.com,1999:blog-20067416.post-33391962898141840142010-09-23T08:01:18.708-04:002010-09-23T08:01:18.708-04:00The constant factor for heapsort is pretty bad, an...The constant factor for heapsort is pretty bad, and the constant factor for shellsort is pretty good.<br /><br />I put together simple implementations of shellsort and heapsort and rigged them to count "mems" (memory accesses that would be needed with a sufficiently smart compiler). The crossover point above which heapsort wins was somewhere between n=3000 and n=4000.Gareth McCaughanhttps://www.blogger.com/profile/05377158305586280009noreply@blogger.comtag:blogger.com,1999:blog-20067416.post-4096916562685896732010-09-23T05:48:14.603-04:002010-09-23T05:48:14.603-04:00BTW, heapsort is done in place, so I'm having ...BTW, heapsort is done in place, so I'm having trouble understanding why you think Shell sort was preferable to heapsort, even in your situation.Jeffrey Shallithttps://www.blogger.com/profile/12763971505497961430noreply@blogger.comtag:blogger.com,1999:blog-20067416.post-5337855464290506722010-09-23T05:09:16.337-04:002010-09-23T05:09:16.337-04:00Pseudo:
Sorry about the mistake about C. I don&#...Pseudo:<br /><br />Sorry about the mistake about C. I don't keep up with "advances" in the C standard, since I find C to be so unpleasant to use. I should have said "had" instead of "has".Jeffrey Shallithttps://www.blogger.com/profile/12763971505497961430noreply@blogger.comtag:blogger.com,1999:blog-20067416.post-63116934245749569752010-09-23T00:56:37.079-04:002010-09-23T00:56:37.079-04:00C does indeed have a native boolean type. It's...C does indeed have a native boolean type. It's been in the standard for over ten years. If your compiler doesn't have it, then it's not compliant.<br /><br />What it doesn't have is a native bit vector type, but then, C doesn't really have native vector types at all. Every C programmer has an implementation in easy reach, because it's a critical piece of infrastructure. In C++, of course, it's in the standard library.<br /><br /><i>I can't tell you how many times I've heard clever ideas dismissed as "An interesting trick, but I can't really imagine using it anywhere." The point is to have lots of tricks in your toolbox, so that you can pull one out as needed.</i><br /><br />There are tools and there are tools. I can probably best illustrate this with a tale of two sort algorithms, both of which are usually considered obsolete.<br /><br />Sort algorithm #1: Shell sort.<br /><br />I used this algorithm to solve a real problem a couple of years ago. It was a real-time firmware application where I had a moderately small amount of data that needed to be sorted, but it was still large enough that a quadratic algorithm wouldn't do the job. On the other hand, while I had lots of read-only memory available, I had essentially no read-write memory spare, so all of the lowest-constant-factor O(n log n) algorithms were out of the question. Shell sort turned out to be the best fit for this problem.<br /><br />Sort algorithm #2: Bubble sort.<br /><br />This algorithm really <i>is</i> useless. On every workload, on every CPU, in every situation that is ever going to come up, insertion sort is always a superior choice. This "always" sounds like a rash universal quantifier, but it's true.<br /><br />Okay, in <i>theory</i>, it could potentially work better than insertion sort on certain special-purpose exotic hardware, like parallel stream processing something-or-other. But on such hardware, there are algorithms which are just as simple and perform even better.<br /><br />In summary, an excellent rule of thumb is that there is no situation where bubble sort is ever the right answer to a real problem. But every programmer should know about it, if only so they don't accidentally re-invent it.<br /><br />I do apologise for being glib. I didn't quite mean "anywhere". What I meant was that for this toy problem, this trick simply isn't a practical solution, when you consider the alternatives.<br /><br />What I didn't mention is that I did work out a fairly realistic scenario where the trick could come in handy. The details are unimportant, but it involved doing some kind of processing on A, which may occasionally backtrack (where "occasionally" means that the number of choice points is small relative to the length of A). In that situation, you only want to "undo" visiting <i>part</i> of A. Details are left as an exercise.Pseudonymhttps://www.blogger.com/profile/04272326070593532463noreply@blogger.comtag:blogger.com,1999:blog-20067416.post-39262569453434974612010-09-22T05:59:39.760-04:002010-09-22T05:59:39.760-04:00Pseudo:
Except that in many implementations, an a...Pseudo:<br /><br />Except that in many implementations, an array of booleans takes up the same amount of space as an array of integers. C, for example, has no native boolean type. <br /><br />I can't tell you how many times I've heard clever ideas dismissed as "An interesting trick, but I can't really imagine using it anywhere." The point is to have lots of tricks in your toolbox, so that you can pull one out as needed.Jeffrey Shallithttps://www.blogger.com/profile/12763971505497961430noreply@blogger.comtag:blogger.com,1999:blog-20067416.post-50540710461237353152010-09-22T01:47:59.108-04:002010-09-22T01:47:59.108-04:00Of course, this "trick" means that B mus...Of course, this "trick" means that B must occupy lg(|A|) times as much storage as it would otherwise need to, and that's only if you tune the representation of B to take into account the length of A. More typically, it would be 16, 32 or 64 times as big as it needs to be.<br /><br />An interesting trick, but I can't really imagine using it anywhere. Were M so big that the Θ(M) time was really hurting performance, I'd be tempted to try undoing the initialisation by traversing the piece of A that has been "consumed" and only zeroing out those entries; if you haven't consumed much of A, they're still likely to be in cache, after all.<br /><br />A more exotic option is to store B in zero-fill demand paged memory. Then the initialisation would only occur as needed.<br /><br />But of course, the first step would always be to do instruction-level profiling and make sure that initialisation of B really was the bottleneck. For this toy problem, it almost certainly isn't.Pseudonymhttps://www.blogger.com/profile/04272326070593532463noreply@blogger.com