Planet GNU

Aggregation of development blogs from the GNU Project

August 07, 2022

Andy Wingo

coarse or lazy?

sweeping, coarse and lazy

One of the things that had perplexed me about the Immix collector was how to effectively defragment the heap via evacuation while keeping just 2-3% of space as free blocks for an evacuation reserve. The original Immix paper states:

To evacuate the object, the collector uses the same allocator as the mutator, continuing allocation right where the mutator left off. Once it exhausts any unused recyclable blocks, it uses any completely free blocks. By default, immix sets aside a small number of free blocks that it never returns to the global allocator and only ever uses for evacuating. This headroom eases defragmentation and is counted against immix's overall heap budget. By default immix reserves 2.5% of the heap as compaction headroom, but [...] is fairly insensitive to values ranging between 1 and 3%.

To Immix, a "recyclable" block is partially full: it contains surviving data from a previous collection, but also some holes in which to allocate. But when would you have recyclable blocks at evacuation-time? Evacuation occurs as part of collection. Collection usually occurs when there's no more memory in which to allocate. At that point any recyclable block would have been allocated into already, and won't become recyclable again until the next trace of the heap identifies the block's surviving data. Of course after the next trace they could become "empty", if no object survives, or "full", if all lines have survivor objects.

In general, after a full allocation cycle, you don't know much about the heap. If you could easily know where the live data and the holes were, a garbage collector's job would be much easier :) Any algorithm that starts from the assumption that you know where the holes are can't be used before a heap trace. So, I was not sure what the Immix paper is meaning here about allocating into recyclable blocks.

Thinking on it again, I realized that Immix might trigger collection early sometimes, before it has exhausted the previous cycle's set of blocks in which to allocate. As we discussed earlier, there is a case in which you might want to trigger an early compaction: when a large object allocator runs out of blocks to decommission from the immix space. And if one evacuating collection didn't yield enough free blocks, you might trigger the next one early, reserving some recyclable and empty blocks as evacuation targets.

when do you know what you know: lazy and eager

Consider a basic question, such as "how many bytes in the heap are used by live objects". In general you don't know! Indeed you often never know precisely. For example, concurrent collectors often have some amount of "floating garbage" which is unreachable data but which survives across a collection. And of course you don't know the difference between floating garbage and precious data: if you did, you would have collected the garbage.

Even the idea of "when" is tricky in systems that allow parallel mutator threads. Unless the program has a total ordering of mutations of the object graph, there's no one timeline with respect to which you can measure the heap. Still, Immix is a stop-the-world collector, and since such collectors synchronously trace the heap while mutators are stopped, these are times when you can exactly compute properties about the heap.

Let's retake the question of measuring live bytes. For an evacuating semi-space, knowing the number of live bytes after a collection is trivial: all survivors are packed into to-space. But for a mark-sweep space, you would have to compute this information. You could compute it at mark-time, while tracing the graph, but doing so takes time, which means delaying the time at which mutators can start again.

Alternately, for a mark-sweep collector, you can compute free bytes at sweep-time. This is the phase in which you go through the whole heap and return any space that wasn't marked in the last collection to the allocator, allowing it to be used for fresh allocations. This is the point in the garbage collection cycle in which you can answer questions such as "what is the set of recyclable blocks": you know what is garbage and you know what is not.

Though you could sweep during the stop-the-world pause, you don't have to; sweeping only touches dead objects, so it is correct to allow mutators to continue and then sweep as the mutators run. There are two general strategies: spawn a thread that sweeps as fast as it can (concurrent sweeping), or make mutators sweep as needed, just before they allocate (lazy sweeping). But this introduces a lag between when you know and what you know—your count of total live heap bytes describes a time in the past, not the present, because mutators have moved on since then.

For most collectors with a sweep phase, deciding between eager (during the stop-the-world phase) and deferred (concurrent or lazy) sweeping is very easy. You don't immediately need the information that sweeping allows you to compute; it's quite sufficient to wait until the next cycle. Moving work out of the stop-the-world phase is a win for mutator responsiveness (latency). Usually people implement lazy sweeping, as it is naturally incremental with the mutator, naturally parallel for parallel mutators, and any sweeping overhead due to cache misses can be mitigated by immediately using swept space for allocation. The case for concurrent sweeping is less clear to me, but if you have cores that would otherwise be idle, sure.

eager coarse sweeping

Immix is interesting in that it chooses to sweep eagerly, during the stop-the-world phase. Instead of sweeping irregularly-sized objects, however, it sweeps over its "line mark" array: one byte for each 128-byte "line" in the mark space. For 32 kB blocks, there will be 256 bytes per block, and line mark bytes in each 4 MB slab of the heap are packed contiguously. Therefore you get relatively good locality, but this just mitigates a cost that other collectors don't have to pay. So what does eager marking over these coarse 128-byte regions buy Immix?

Firstly, eager sweeping buys you eager identification of empty blocks. If your large object space needs to steal blocks from the mark space, but the mark space doesn't have enough empties, it can just trigger collection and then it knows if enough blocks are available. If no blocks are available, you can grow the heap or signal out-of-memory. If the lospace (large object space) runs out of blocks before the mark space has used all recyclable blocks, that's no problem: evacuation can move the survivors of fragmented blocks into these recyclable blocks, which have also already been identified by the eager coarse sweep.

Without eager empty block identification, if the lospace runs out of blocks, firstly you don't know how many empty blocks the mark space has. Sweeping is a kind of wavefront that moves through the whole heap; empty blocks behind the wavefront will be identified, but those ahead of the wavefront will not. Such a lospace allocation would then have to either wait for a concurrent sweeper to advance, or perform some lazy sweeping work. The expected latency of a lospace allocation would thus be higher, without eager identification of empty blocks.

Secondly, eager sweeping might reduce allocation overhead for mutators. If allocation just has to identify holes and not compute information or decide on what to do with a block, maybe it go brr? Not sure.

lines, lines, lines

The original Immix paper also notes a relative insensitivity of the collector to line size: 64 or 256 bytes could have worked just as well. This was a somewhat surprising result to me but I think I didn't appreciate all the roles that lines play in Immix.

Obviously line size affect the worst-case fragmentation, though this is mitigated by evacuation (which evacuates objects, not lines). This I got from the paper. In this case, smaller lines are better.

Line size affects allocation-time overhead for mutators, though which way I don't know: scanning for holes will be easier with fewer lines in a block, but smaller lines would contain more free space and thus result in fewer collections. I can only imagine though that with smaller line sizes, average hole size would decrease and thus medium-sized allocations would be harder to service. Something of a wash, perhaps.

However if we ask ourselves the thought experiment, why not just have 16-byte lines? How crazy would that be? I think the impediment to having such a precise line size would mainly be Immix's eager sweep, as a fine-grained traversal of the heap would process much more data and incur possibly-unacceptable pause time overheads. But, in such a design you would do away with some other downsides of coarse-grained lines: a side table of mark bytes would make the line mark table redundant, and you eliminate much possible "dark matter" hidden by internal fragmentation in lines. You'd need to defer sweeping. But then you lose eager identification of empty blocks, and perhaps also the ability to evacuate into recyclable blocks. What would such a system look like?

Readers that have gotten this far will be pleased to hear that I have made some investigations in this area. But, this post is already long, so let's revisit this in another dispatch. Until then, happy allocations in all regions.

07 August, 2022 09:44AM by Andy Wingo

August 06, 2022

Christine Lemmer-Webber

A Guile Steel smelting pot

Last month I made a blogpost titled Guile Steel: A Proposal for a Systems Lisp. It got more attention than I anticipated, which is both a blessing and and curse. I mean, mostly the former, the curse isn't so serious, it's mostly that the post was aimed at a specific community and got more coverage than that, and funny things happen when things leave their intended context.

The blessing is that real, actual progress has happened, in terms of organization, actual development (thanks to others mostly!), and a compilation of potential directions. In many ways "Guile Steel" was meant to be a meta project, somewhat biased around Guile but more so a clever name to start brewing some ideas (and gathering intelligence) around, a call-to-arms for those who are likeminded, a test even to see if there are enough likeminded people out there. The answer to that one is: yes, and there's actually a lot that's happening or has happened historically. I actually think Lisp is going through a quiet renaissance and is on the verge or a major revival, but that's a topic for another post. The goal of this post is to give a lay of the landscape, as I've seen it since then. There's a lot out there.

If you enjoy this post by the way, there's an IRC channel: #guile-steel on irc.libera.chat. It's surprisingly well populated given that people have only shown up through word of mouth.

First, an aside on language (again)

Also by-the-way, it's debatable what "systems language" even means, and the previous post spent some time debating that. Language is mostly fuzzy, and subject to the constant battle between fuzzy and crisp systems, and "systems language" is something people evoke to make themselves sound like very crispy people, even though the term could hardly be fuzzier.

We're embracing the hand-waviness here; I've previously mentioned that "Blockchain" is to "Bitcoin" what "Roguelike" is to "Rogue". Similarly, "Systems Language" is to "C/Rust" what "Roguelike" is to "Rogue".

My friend Technomancy put things about as well or as succinctly as you could: "low-level enough to bootstrap a runtime". We'll extend "runtime" to not only mean "programming language runtime" but also "operating system runtime", and that's the scope of this post.

With that, let's start diving into directions.

Carp and Scopes

I was unaware at the time of writing the previous post of two remarkable "systems lisps" that already existed, are here, now, today, and you can and maybe should use them: Carp and Scopes. Both of them are statically typed, and both perform automatic memory management without the overhead of a garbage collector or reference counting in a style familiar to Rust.

They are also both kind of similar yet different. Carp is written on top of Haskell, looks a lot like Clojure in style. Scopes is written in C++, looks a lot like Scheme, and has an optional, non-parenthetical whitespace syntax which reminds me a lot of Wisp, so is maybe more amenable to the type of people who fear parentheses or must work with people who do.

I can't make a judgement about either; I would like to find some time to try each of them. Scopes looks more up my alley of the two. If someone packaged either of these languages for Guix I would try it in a heartbeat.

Anyway, Carp and Scopes already are systems lisps of sorts you can try today. (If you've made something cool with either of them, let me know.)

Pre-Scheme

There's a lot to say on this one, despite its obscurity, enough that I'm going to give it several sub-headings. I'll get the big one up front: Andrew Whatson is doing an incredible job porting Pre-Scheme to Guile. But I'll get more to that below.

What the heck is Pre-Scheme anyway?

PreScheme (or is it Pre-Scheme or prescheme or what? nobody is consistent, and I won't be either) is a "systems lisp" that is used to bootstrap the incredible but "sleeper hit" (or shall we say "cult classic"?) of programming language runtimes, Scheme48. PreScheme compiles to C, is statically typed with type inference based on a modified version of Hindley-Milner, and uses manual memory management for heap-allocated resources (much like C) rather than garbage collection. (C is just the current main target, compiling directly to native architectures or WebAssembly is also possible.)

The wild things about PreScheme are that unlike C or Rust, you can hack on it live at the REPL just like Scheme, and you can even incorporate a certain amount of Scheme, and it mostly looks like Scheme. But it still compiles down efficiently to low-level code.

It's used to implement Scheme48's virtual machine and garbage collector, and is bootstrappable from a working Scheme48, but there's also apparently a version sitting around somewhere on top of some metacircular Scheme which Jonathan Rees wrote on top of Common Lisp, giving it a good bootstrapping story. While used for Scheme48, and usable from it today, there's no reason you can't use it for other things, and a few smaller projects have.

What's more wild about PreScheme is how incredibly good of an idea it is, how long it's sat around (since the 80s, with a lot of work happening in the 90s!), and how little attention it's gotten. PreScheme's thoughtful design actually follows from Richard Kelsey's amazing PhD dissertation, Compilation By Program Transformation, which really feels like the kind of obscure CS thing that, if you've made it this far in this writeup, you probably would love reading. (Thank you to Olin Shivers for reviving this dissertation in LaTeX, which otherwise would have been lost to history.)

guile-prescheme

Now I did mention prescheme and how I thought that was a fairly interesting starting point on the last Guile Steel blogpost, and I actually got several people reaching out to me saying they wanted to take up this initiative, and a few of them suggested maybe they should start porting PreScheme to Guile, and I said "yes you should!" to all of them, but one person took up the initiative quickly and has been doing a straight and faithful port to Guile named guile-prescheme.

The emulator (which isn't too much code really) has already worked for a couple of weeks (which means you can already hack PreScheme at Guile's REPL, and Andrew Whatson says that the "compile to C" compiler is already well on its way, and will likely be there in about a month.

The main challenge apparently is the conversion of macros, which are stuck in the r4rs era of Scheme. Andrew has been slowly converting everything to syntax-case. syntax-case is encoded in r6rs and even more appealingly r7rs-small, which begs the question: how general of a port is this? Does it really have to just be to Guile? And that brings us to our next subsection...

The Secret Society of PreScheme Revivalists

Okay there's not really a secret society, we just have an email thread going and I organized a video call recently, and we're likely to do another one (I hope). This one was really good, very productive. (We didn't record it, sadly. Maybe we should have.)

On said video call we got Andrew Whatson of course, who's doing the current porting effort, but also Richard Kelsey (the original brain behind PreScheme, co-author of much of Scheme48), Michael Sperber (current maintainer of Scheme48 and also someone who has used PreScheme previously commercially, to do some monte carlo simulation things for some financial firm or something curious like that), and Jonathan Rees (co-author of Scheme48, and one of my friends who I like to call up and talk about all sorts of curious topics). There were a few others, all cool people, and also me, hand-waving excitedly as usual.

As an aside, my wife Morgan says my superpower is that I'm good at "showing up in a room and being excited and getting everyone else excited", and she's right, I think. And... well there's just a lot of interesting stuff in computer science history, amazing people whose work has just been mostly ignored, stuff left on the shelf. It's not what the Tech Influencers (TM) are currently touting, it's not what a FAANG company is going to currently hire you to do, but if you're trying to solve the hard problems people don't even realize they have, your best bet is to scour history.

I don't know if it's true or not but this felt like one of those times where the folks who have worked on PreScheme historically seemed kind of surprised that here we had a gathering of people who are extremely interested in the stuff they've done, but also happy about it. Anyway, that seemed like my reading, I like to think so anyway. Andrew (I think?) said some nice things about how it was just exciting to be able to talk to the people who have done these things, and I agree. It is cool stuff. We are grateful to be able to talk about it.

The conversation was really nice, we got some interesting historical information (some of that which I've conveyed here), and Richard Kelsey indicated that he's been doing work on microcontrollers and wishes he could be using PreScheme, but the things clients/employers get nervous about is "will be able to actually hire anyone to work on this stuff who isn't just you?" I'd like to think that we're building up enough enthusiasm where we can demonstrate in the affirmative, but that's going to take some time.

Anyway, I hinted in the last part that some of the more interesting conversation came to, just how portable is this port? Andrew indicated that he thought that the port to Guile as he was doing it was already helping to make things more portable. Andrew is just focusing on Guile first, but is avoiding the Guile-specific ice-9 namespace of Guile modules (which in this case, from a standardization perspective, becomes a little bit too appropriate) and is using as much generic Scheme and SRFI extensions as possible. Once the Guile version gets working, the goal is then to try porting to a more standardized form of Scheme (probably r7rs-small), and then that would mean that any Scheme following that standard could use the same version of PreScheme. Michael Sperber seemed to indicate that maybe Scheme48 could use this version too.

This would actually be pretty incredible because it would mean that any version of Scheme following the Scheme standard would suddenly have access to PreScheme, and any of those could also be used to bootstrap a PreScheme based Scheme.

A PreScheme JIT?

I thought Andrew Whatson (flatwhatson here) said this well enough himself so I'm just going to quote it verbatim:

<flatwhatson> re: pre-scheme interest for bootstrapping, i think it's more
              interesting than just "compiling to C"
<flatwhatson> Michael Sperber's rejected paper "A Tractable Native-Code Scheme
              System" describes repurposing the pre-scheme compiler (more
              accurately called the transformational compiler) as a jit
              byte-code optimizer and native-code emitter
<flatwhatson> the prescheme compiler basically lowers prescheme code to a
              virtual machine-code and then emits that as C
<flatwhatson> i think it would be feasible to directly emit native code at
              that point instead
<flatwhatson> https://www.deinprogramm.de/sperber/papers/tractable-native-code-scheme-system.pdf
<flatwhatson> Kelsey's dissertation describes transforming a high-level
              language to a low-level language, not specifically scheme to C.
<flatwhatson> > The machine language is an assembly language written in the
              syntax of the intermediate language and has a much simpler
              semantics. The machine is assumed to be a Von Neumann machine
              with a store and register-to-register instructions. Identifiers
              represent the machine’s registers and primitive procedures are
              the machine’s instructions.
<flatwhatson> Also, we have code for the unreleased byte-code jit-compiling
              native-emitting version of Scheme 48:
              https://www.s48.org/cgi-bin/hgwebdir.cgi/s48-compiler/

(How the hell that paper was rejected btw, I have no idea. It's great.)

Future directions for PreScheme

One obvious improvement to PreScheme is: compile to WebAssembly (aka WASM)! This would be pretty neat and maybe, maybe, maybe could mean a good path to getting more Schemes in the browser without using Emscripten (which is a bit heavy-handed of an approach). Andrew and I both think this is a fun idea, worth exploring. I think once the "compile to C" part of the port to Guile is done, it's worth beginning to look at in earnest.

Relatedly, it would also, I think, be pretty neat if guile-prescheme was compelling enough for more of Guile to be rewritten in it. This would improve Guile's already-better-than-most bootstrapping story and also make hacking on certain parts of Guile's internals more accessible and pleasant to a larger part of Guile's existing userbase.

The other obvious improvement to PreScheme is exploring (handwave handwave handwave) the kinds of automated memory management which have become popular with Rust's borrow checker and also appear in Carp and Scopes, as discussed above.

3L: The Computing System of the Future (???)

I mentioned that an appealing use of PreScheme might be to write not just a language runtime, but also an operating system. A very interesting project called 3L exists and is real and does just that. In fact, it's also a capability-secure operating system, and it cites all the right stuff and has all the right ideas going for it. And it's using PreScheme!

Now the problem is, seemingly nobody I know who would be interested in exactly a project like this even had heard of it before (except for the incredible and incredibly nice hacker pukkamustard is the one who made me even aware of it by mentioning it in the #guile-steel chatroom), and I couldn't even find the actual code on the main webpage. But there actually is source code, not a lot of it, but it's there, and in a way "not a lot of it" is not a bad thing here, because what's there looks stunningly similar to a very familiar metacircular evaluator, which begs the question, is that really enough, though?

And actually maybe it is, because hey look there's a demo video and a nice talk. And it's using Scheme48!

(As a complete aside: I'd be much more likely to play with Scheme48 if someone Geiser support for it... that's something I've poked at doing every now and then but I haven't had enough of a dedicated block of time. If you, dear reader, feel inspired enough to add such support, or actually if you give 3L a try, let me know).

Anyway, cool stuff, I've been meaning to reach out to the author, maybe I will after I post this. I wonder what's come of it. (It's also missing a license file or any indicators, but maybe we could get that fixed easily enough?)

WebAssembly

I had a call with someone recently who said WebAssembly was really only useful for C/Rust users, and I thought this was fairly surprising/confusing, but maybe that's because I think WebAssembly is pretty cool and have hand-coded a small amount of it for fun. Its text-based syntax is S-Expression based which makes it appealing for lispy type folks, and just easy to parse and work with in general.

It's stretching it a bit to call WebAssembly a Lisp, it's really just something that's designed to be an intermediate language (eg in GCC), a place where compiler authors often deign it okay/acceptable to use s-expressions because they don't fear that they'll scare off non-lispers or PLT people, because hey most users aren't going to touch this stuff anyway, right?

I dunno, I consider it a win at least that s-expressions have survived here. I showed up to an in-person WebAssembly meeting once and talked to one of the developers about it, praised them for this choice, and they said "Oh... yeah, well, we initially did it because it was the easiest thing to start with, and then eventually we came to like it, which I guess is the usual thing that happens with S-Expressions." (Literally true, look up the history of M-Expressions vs S-Expressions.)

At any rate, most people aren't coding WebAssembly by hand. However, you could, and if you're going to, a Lisp based environment is actually a really good choice. wasm-adventure is a really cool little demo game (try it!), all hand-written in WebAssembly kinda sorta. The README gives its motivations as "Is it possible (and enjoyable) to write a game directly in web assembly's text format? Eventually, would it be cool to generate wat from Scheme code using the Racket lang system?", and the answer becomes an emphatic "yes". What's interesting is that Lisp's venerable quasiquote does much of the heavy lifting to make, without too much work, a little DSL for authoring WebAssembly which results in some surprisingly easy to read code compared to generic WebAssembly. (The author, Zoé Martin, is another one of those quiet geniuses you run into on the internet; she has a lovely homebrew computer design too.)

So what I'd really like is to see more languages compiling directly to WebAssembly without emscripten as an intermediate hack. Guile especially, of course. Andy Wingo gave an amazing little talk on this where he does a little (quasi, pre-recorded) live coding demo of compiling to WebAssembly and I thought "YES!!! Compiling to WASM is probably right around the corner" and it turns out that's probably not the case because Wingo would like to see some important extensions to WASM land, and I guess, yes that probably makes sense, and also he's working on a new garbage collector which seems damn cool and like it'll be really good for Guile and maybe even could help the compiling to WASM story even before the much desired WASM-GC extension we all want lands, but who knows. I mean it would also be nice to have like, the tail call elimination extension, etc etc etc. But see also Wingo's writeup about targeting the web from last year, etc etc. (And on that note, I mean, is Webassembly the new Kubernetes?)

As another aside, there are two interesting Schemes which are actually written in WebAssembly, or rather, one written directly in hand-coded WASM named scheme.wasm, and one which compiles itself to Webassembly called Schism (which has a cool paper, but sadly hasn't been updated in a couple of years).

As another another aside, I was on a video call with Douglas Crockford at one point and mentioned WebAssembly and how cool I thought it was, and Crock kinda went "meh" to it, and I was like what? I mean it has ocap people you and I have both collaborated on it with, overall its design seems pretty good, better than most of the things of its ilk that have been tried before, why are you meh'ing WebAssembly? And Crock said that well, it's not that WebAssembly is bad, it's just that it felt like an opportunity to do something impactful, and it's "just another von neumann architecture", like, boring, can't we do better? But when I asked for specific alternatives, Crock didn't have a specific design in mind, just thought that maybe we could do better, maybe it could even incorporate actors at a more fundamental level.

Well... it turns out we both know someone who did just that, and (so I hear) both recently got nerdsniped by that very same person who had just such an architecture...

Mycelia and uFork

So I had a call with sorta-colleague, sorta-friend I guess? I'm talking about Dale Schumacher, and I don't know him super well, we don't get to talk that much, but I've enjoyed the amount we have. Dale has been trying to get me to have a video call for a while, we finally did, and I was expecting us to talk about our respective actor'y system projects, and we did... but the big surprise was hearing about Mycelia, Dale's ocap-secure hybrid-actor-model-lisp-machine-lambda-calculus operating system, and its equally astounding, actually maybe more astounding, virtual machine and maybe potentially CPU architecture design, uFork. We're going to take a major digression but I promise that it ties back in.

This isn't the first time Dale's reached out and it's resulted in me being surprised and nerdsniped. A few years ago Dale reached out to me to talk about this programming language he wrote called Humus. What's astounding personally about Humus is that it has an eerie amount of similarity to Spritely Goblins, the ocap distributed object architecture I've been working on the last few years, despite that we fully designed our systems independently. Dale beat me to it, but it was an independent reinvention in the sense that I simply wasn't aware of Humus until Dale started emailing me.

The eerie similarity is because I think Dale and I's systems are the most seriously true-to-form implementations of the "Classic Actor Model" that have been implemented in recent times (more true than say, Erlang, which does some other things, and "Classic" thrown on there because Carl Hewitt has some new ideas that he feels strongly should now be associated with "Actor Model" that can be layered on Dale and I's systems, but are not there at the base layer). (Actually, Goblins supports one other thing that makes it more the vat model of computation, but that isn't important for this post.) The Classic Actor Model says (hand-waving past pre-determinism in the general case, at least from the perspective of a single actor, due to ordering concerns... but those too can be layered on) that you can do pretty much all computation in terms of just actors, which are these funky distributed objects which handle messages one at a time, and while handling them are only able to do some combination of three things: (1) send messages to actors they know about, (2) create new actors (and get their address in the process, which they can share with other actors should they choose... argument passing, basically), and (3) designate their behavior for the next time they are handling a message. It's pretty common to use "become" for last that operation, but the curious thing that both Dale and I did was use lambdas as the thing you become. (By the way, those familiar with Scheme history should notice something interesting and familiar here, and for that same reason Dale and I are also in the shared company of being scolded by Carl Hewitt for saying our actors are made out of lambdas, despite him liking our systems otherwise, I think...)

I remarked off-hand that "well I guess one of the main differences between our systems, and maybe a thing you might not like, is that mine is lispy / based on Scheme, and..."

Dale waved his hand. "That's mostly surface..."

"Surface syntax, yeah I know. So I guess it doesn't..."

"No wait it does matter. What I'm trying to show you is that I actually do like that kind of stuff. In fact I have some projects which use it at a fundamental level. Here... let me show you..." And that's when we started talking about uFork, the instruction architecture he was working on, which I later found was actually part of a larger thing called Mycelia.

Well I'm glad I got the lecture directly from Dale because, let's see, how does the Mycelia project brand itself (at the time of writing)? "A bare-metal actor operating system for Raspberry Pi." Well, maybe this underwhelming self-description is why seemingly nobody I know (yes, like 3L above) has seemingly heard about it, despite it being extremely up the alley of the kind of programming people I tend to hang out with.

Mycelia is not just some throwaway raspberry pi project (which is certainly the impression I would have gotten from lazily scanning that page). Most of those are like, some cute repackaging of some existing FOSS POSIX'y thing. But Mycelia is an actually-working, you-can-actually-boot-it-on-real-hardware open source operating system (under Apache v2) with a ton of novel ideas which happens to be targeting the Raspberry Pi, but it could be ported to run on anything.

Anyway, there are a lot of interesting stuff in there, but here's a bulleted list summary. For Mycelia:

  • It's is an object-capability-secure operating system
  • It has a Lisp-like language for coding, pretty much Scheme-like, to hack around on
  • The Kernel language / Vau calculus show up, which is... wild
  • It encodes the Actor model and the Lambda calculus in a way that is sensible and coherent afaict
  • It is a "Lisp Machine" in many senses of the term.

But the uFork virtual machine / abstract idea for a CPU also are curious on their own. I dunno, I spent the other night reading it kind of wildly after our call. It also encodes the lambda calculus / actor model in fundamental instructions.

Dale was telling me he'd like to build an actual, physical CPU, but of course that takes a lot of resources, so he might settle for an FPGA for now. The architecture, should it be built, also somehow encodes a hardware garbage collector, which I haven't heard of anything doing that since the actual physical Lisp Machines died out.

At any rate, Dale was really excited to tell me about why his system encoded instructions operating on memory split in quads. He asked me why I thought that would be; I'm not honestly sharp enough in this kind of area to know, sadly, though I said "I hope it's not because you're planning on encoding RDF at the CPU layer". Thankfully it's not that, but then he started mentioning how his system encodes a stream of continuations...

Wait, that sounds familiar. "Have you ever heard of something called sectorlisp?" I asked, with a raised eyebrow.

"Scroll to the bottom of the document," Dale said, grinning.

Oh, there it was. Of course.

sectorlisp

The most technically impressive thing I think I've ever seen is John McCarthy's "Lisp implemented in Lisp", also known as a "metacircular evaluator". If you aren't familiar with it, it's been summarized well in the talk The Most Beautiful Program Ever Written by William Byrd. I think the best way to understand it really, and (I'm biased) the easiest to read version of things is in the Scheme in Scheme section of A Scheme Primer (though I wrote that for my work, and as said, I'm biased... I don't think I did anything new there, just explained ideas as simply as I could).

The second most technically impressive thing I've ever seen is sectorlisp, and the two are directly related. According to its README, "sectorlisp is a 512-byte implementation of LISP that's able to bootstrap John McCarthy's meta-circular evaluator on bare metal." Where traditional metacircular evaluator examples can be misconstrued as being the stuff of pure abstractlandia, sectorlisp gets brutally direct about things. In one sector (half a kilobyte!!!), sectorlisp manages to encode a whole-ass lisp system that actually runs. And yet, the nature of the metacircular evaluator persists. (Take that, person on Hacker News who called metacircular evaluators "cheating"! Even if you think mine was, I don't think you can accuse sectorlisp's of that.)

If you do nothing else, watch the sectorlisp blinkenlights demo, even just as an astounding visual demo alone. (Blinkenlights is another project of Justine's, and also wildly impressive.) I highly recommend the following blogposts of Justine's: SectorLISP Now Fits in One Sector, Lisp with GC in 436 bytes, and especially Lambda Calculus in 383 Bytes. Hikaru Ikuta (woodrush) has also written some amazing blogposts, including Extending SectorLISP to Implement BASIC REPLs and Games and Building a Neural Network in Pure Lisp Without Built-In Numbers Using Only Atoms and Lists (and related, but not part of sectorlisp: A Lisp Interpreter Implemented in Conway's Game of Life, which gave off strong Wireworld Computer vibes for me). If you are only going to lazily scan through one of those blogposts, I recommend it be Lambda Calculus in 383 Bytes, which has some wild representations of the ideas (including visually), a bit too advanced for me at present admittedly, though I stare at them in wonder.

I had a bunch more stuff here, partly because the author is someone I find both impressive technically but who has also said some fairly controversial things... to say the least. But I think it was too much of a digression for this article. The short version is that Justine's stuff is probably the smartest, most mind-blowing tech I've ever seen, kinda scarily and intimidatingly smart, and it's hard to mentally reconcile that with some of those statements. I don't know, maybe she wants to move past that phase, I'd like to think so. I think she hasn't said anything like that in a long time, and it feels out of phase with the rest of this post but... it feels like something that needs to be acknowledged.

GOOL, GOAL, and OpenGOAL

Every now and then when people say Lisp couldn't possibly be performant, Lisp people like to bring up that Naughty Dog famously had its own Lisp implementations for most of its earlier games. Andy Gavin has written about GOOL, which was a mix of lisp and assembly and of course lisp generating assembly, and I don't think much was written about its followup GOAL until OpenGOAL came up, which... I haven't looked at it too much tbh. I guess it's getting interesting for some people for the ability to play Jak and Daxter on modern hardware (which I've never played but looked fun), but I'm more curious if someone's gonna poke at it to do something completely different.

But I do know some of the vague legends. I don't remember if this is true or where I read it but one of them that Sony accused Naughty Dog of accessing some devkit or APIs they weren't supposed to have access to because they were pulling off a bunch of features performantly in Crash Bandicoot that were assumed otherwise not possible. But nope, just lisp nerds making DSLs that pull off some crazy shit I guess.

BONUS: Shoutout to Kandria

Oh, speaking of games written in Lisp, I guess Kandria's engine is gonna be FOSS, and that game looks wicked cool, maybe support their Kickstarter while you still can. It's not really in the theme of this post from the definition of "systems lisp" I gave earlier, but this blogpost about its lispy internals is really neat.

Okay here's everything else we're done

This was a long-ass post. There's other things maybe that could be discussed, so I'll just dump briefly:

Okay, that's it. Hopefully you found something interesting out of this post. Meanwhile, I was just gonna spend an hour on this. I spent my whole day! Eeek!

06 August, 2022 06:35PM by Christine Lemmer-Webber (cwebber@dustycloud.org)

August 04, 2022

GNUnet News

GNUnet 0.17.3

GNUnet 0.17.3

This is a bugfix release for gnunet 0.17.2. In addition to the fixed in the source, the documentation websites including the handbook have been updated and consolidated: https://docs.gnunet.org .

Notably, the GNUnet project now publishes a GNS zone for its websites which can be used to test resolution on any installation. For example:

$ gnunet-gns -t ANY -u www.gnunet.org

Download links

The GPG key used to sign is: 3D11063C10F98D14BD24D1470B0998EF86F59B6A

Note that due to mirror synchronization, not all links may be functional early after the release. For direct access try http://ftp.gnu.org/gnu/gnunet/

Noteworthy changes in 0.17.3 (since 0.17.2)

  • DHT : Various bugfixes in the protocol.
  • TRANSPORT : Fix HTTPS tests. #7257
  • DOCUMENTATION :
    • Migrate from texinfo to sphinx.
    • Dropped dependency on texinfo.
    • Added dependency on sphinx.

A detailed list of changes can be found in the ChangeLog and the bugtracker .

04 August, 2022 10:00PM

FSF Events

Free Software Directory meeting on IRC: Friday, August 12, starting at 12:00 EDT (16:00 UTC)

Join the FSF and friends Friday, August 12, from 12:00 to 15:00 EDT (16:00 to 19:00 UTC) to help improve the Free Software Directory.

04 August, 2022 08:49PM

Free Software Directory meeting on IRC: Friday, August 5, starting at 12:00 EDT (16:00 UTC)

Free Software Directory meeting on IRC: Friday, August 5, starting at 12:00 EDT (16:00 UTC)

04 August, 2022 08:43PM

August 02, 2022

libc @ Savannah

The GNU C Library version 2.36 is now available

The GNU C Library
=================

The GNU C Library version 2.36 is now available.

The GNU C Library is used as the C library in the GNU system and
in GNU/Linux systems, as well as many other systems that use Linux
as the kernel.

The GNU C Library is primarily designed to be a portable
and high performance C library.  It follows all relevant
standards including ISO C11 and POSIX.1-2017.  It is also
internationalized and has one of the most complete
internationalization interfaces known.

The GNU C Library webpage is at http://www.gnu.org/software/libc/

Packages for the 2.36 release may be downloaded from:
        http://ftpmirror.gnu.org/libc/
        http://ftp.gnu.org/gnu/libc/

The mirror list is at http://www.gnu.org/order/ftp.html

NEWS for version 2.36
=====================

Major new features:

  • Support for DT_RELR relative relocation format has been added to

  glibc.  This is a new ELF dynamic tag that improves the size of
  relative relocations in shared object files and position independent
  executables (PIE).  DT_RELR generation requires linker support for
  -z pack-relative-relocs option, which is supported for some targets
  in recent binutils versions.  Lazy binding doesn't apply to DT_RELR.

  • On Linux, the pidfd_open, pidfd_getfd, and pidfd_send_signal functions

  have been added.  The pidfd functionality provides access to a process
  while avoiding the issue of PID reuse on tranditional Unix systems.

  • On Linux, the process_madvise function has been added. It has the

  same functionality as madvise but alters the target process identified
  by the pidfd.

  • On Linux, the process_mrelease function has been added.  It allows a

  caller to release the memory of a dying process.  The release of the
  memory is carried out in the context of the caller, using the caller's
  CPU affinity, and priority with CPU usage accounted to the caller.

  • The “no-aaaa” DNS stub resolver option has been added.  System

  administrators can use it to suppress AAAA queries made by the stub
  resolver, including AAAA lookups triggered by NSS-based interfaces
  such as getaddrinfo.  Only DNS lookups are affected: IPv6 data in
  /etc/hosts is still used, getaddrinfo with AI_PASSIVE will still
  produce IPv6 addresses, and configured IPv6 name servers are still
  used.  To produce correct Name Error (NXDOMAIN) results, AAAA queries
  are translated to A queries.  The new resolver option is intended
  primarily for diagnostic purposes, to rule out that AAAA DNS queries
  have adverse impact.  It is incompatible with EDNS0 usage and DNSSEC
  validation by applications.

  • On Linux, the fsopen, fsmount, move_mount, fsconfig, fspick, open_tree,

  and mount_setattr have been added.  They are part of the new Linux kernel
  mount APIs that allow applications to more flexibly configure and operate
  on filesystem mounts.  The new mount APIs are specifically designed to work
  with namespaces.

  • localedef now accepts locale definition files encoded in UTF-8.

  Previously, input bytes not within the ASCII range resulted in
  unpredictable output.

  • Support for the mbrtoc8 and c8rtomb multibyte/UTF-8 character conversion

  functions has been added per the ISO C2X N2653 and C++20 P0482R6 proposals.
  Support for the char8_t typedef has been added per the ISO C2X N2653
  proposal.  The functions are declared in uchar.h in C2X mode or when the
  _GNU_SOURCE macro or C++20 __cpp_char8_t feature test macro is defined.
  The char8_t typedef is declared in uchar.h in C2X mode or when the
  _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature test macro
  is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type).

  • The functions arc4random, arc4random_buf, and arc4random_uniform have been

  added.  The functions wrap getrandom and/or /dev/urandom to return high-
  quality randomness from the kernel.

  • Support for LoongArch running on Linux has been added.  This port requires

  as least binutils 2.38, GCC 12, and Linux 5.19.  Currently only hard-float
  ABI is supported:

    - loongarch64-linux-gnu

  The LoongArch ABI is 64-bit little-endian.

Deprecated and removed features, and other changes affecting compatibility:

  • Support for prelink will be removed in the next release; this includes

  removal of the LD_TRACE_PRELINKING, and LD_USE_LOAD_BIAS, environment
  variables and their functionality in the dynamic loader.

  • The Linux kernel version check has been removed along with the

  LD_ASSUME_KERNEL environment variable.  The minimum kernel used to built
  glibc is still provided through NT_GNU_ABI_TAG ELF note and also printed
  when libc.so is issued directly.

  • On Linux, The LD_LIBRARY_VERSION environment variable has been removed.

The following bugs are resolved with this release:

  [14932] dynamic-link: dlsym(handle, "foo") and dlsym(RTLD_NEXT, "foo")
    return different result with versioned "foo"
  [16355] libc: syslog.h's SYSLOG_NAMES namespace violation and utter
    mess
  [23293] dynamic-link: aarch64: getauxval is broken when run as ld.so
    ./exe and ld.so adjusts argv on the stack
  [24595] nptl: [2.28 Regression]: Deadlock in atfork handler which
    calls dlclose
  [25744] locale: mbrtowc with Big5-HKSCS returns 2 instead of 1 when
    consuming the second byte of certain double byte characters
  [25812] stdio: Libio vtable protection is sometimes only partially
    enforced
  [27054] libc: pthread_atfork handlers that call pthread_atfork
    deadlock
  [27924] dynamic-link: ld.so: Support DT_RELR relative relocation
    format
  [28128] build: declare_symbol_alias doesn't work for assembly codes
  [28566] network: getnameinfo with NI_NOFQDN is not thread safe
  [28752] nss: Segfault in getpwuid when stat fails
  [28815] libc: realpath should not copy to resolved buffer on error
  [28828] stdio: fputwc crashes
  [28838] libc: FAIL: elf/tst-p_align3
  [28845] locale: ld-monetary.c should be updated to match ISO C and
    other standards.
  [28850] libc: linux: __get_nprocs_sched reads uninitialized memory
    from the stack
  [28852] libc: getaddrinfo leaks memory with AI_ALL
  [28853] libc: tst-spawn6 changes current foreground process group
    (breaks test isolation)
  [28857] libc: FAIL: elf/tst-audit24a
  [28860] build: --enable-kernel=5.1.0 build fails because of missing
    __convert_scm_timestamps
  [28865] libc: linux: _SC_NPROCESSORS_CONF and _SC_NPROCESSORS_ONLN are
    inaccurate without /sys and /proc
  [28868] dynamic-link: Dynamic loader DFS algorithm segfaults on
    missing libraries
  [28880] libc: Program crashes if date beyone 2038
  [28883] libc: sysdeps/unix/sysv/linux/select.c: __select64
    !__ASSUME_TIME64_SYSCALLS && !__ASSUME_PSELECT fails on Microblaze
  [28896] string: strncmp-avx2-rtm and wcsncmp-avx2-rtm fallback on non-
    rtm variants when avoiding overflow
  [28922] build: The .d dependency files aren't always generated
  [28931] libc: hosts lookup broken for SUCCESS=CONTINUE and
    SUCCESS=MERGE
  [28936] build: nm: No such file
  [28950] localedata: Add locale for ISO code "tok" (Toki Pona)
  [28953] nss: NSS lookup result can be incorrect if function lookup
    clobbers errno
  [28970] math: benchtest: libmvec benchmark doesn't build with make
    bench.
  [28991] libc: sysconf(_SC_NPROCESSORS_CONF) should read
    /sys/devices/system/cpu/possible
  [28993] libc: closefrom() iterates until max int if no access to
    /proc/self/fd/
  [28996] libc: realpath fails to copy partial result to resolved buffer
    on ENOENT and EACCES
  [29027] math: [ia64] fabs fails with sNAN input
  [29029] nptl: poll() spuriously returns EINTR during thread
    cancellation and with cancellation disabled
  [29030] string: GLIBC 2.35 regression - Fortify crash on certain valid
    uses of mbsrtowcs (*** buffer overflow detected ***: terminated)
  [29062] dynamic-link: Memory leak in _dl_find_object_update if object
    is promoted to global scope
  [29069] libc: fstatat64_time64_statx wrapper broken on MIPS N32 with
    -D_FILE_OFFSET_BITS=64 and -D_TIME_BITS=64
  [29071] dynamic-link: m68k: Removal of ELF_DURING_STARTUP optimization
    broke ld.so
  [29097] time: fchmodat does not handle 64 bit time_t for
    AT_SYMLINK_NOFOLLOW
  [29109] libc: posix_spawn() always returns 1 (EPERM) on clone()
    failure
  [29141] libc: _FORTIFY_SOURCE=3 fail for gcc 12/glibc 2.35
  [29162] string: [PATCH] string.h syntactic error:
    include/bits/string_fortified.h:110: error: expected ',' or ';'
    before '__fortified_attr_access'
  [29165] libc: [Regression] broken argv adjustment
  [29187] dynamic-link: [regression] broken argv adjustment for nios2
  [29193] math: sincos produces a different output than sin/cos
  [29197] string: __strncpy_power9() uses uninitialised register vs18
    value for filling after \0
  [29203] libc: daemon is not y2038 aware
  [29204] libc: getusershell is not 2038 aware
  [29207] libc: posix_fallocate fallback implementation is not y2038
    aware
  [29208] libc: fpathconf(_PC_ASYNC_IO) is not y2038 aware
  [29209] libc: isfdtype is not y2038 aware
  [29210] network: ruserpass is not y2038 aware
  [29211] libc: __open_catalog is not y2038 aware
  [29213] libc: gconv_parseconfdir is not y2038 aware
  [29214] nptl: pthread_setcanceltype fails to set type
  [29225] network: Mistyped define statement in socket/sys/socket.h in
    line 184
  [29274] nptl: __read_chk is not a cancellation point
  [29279] libc: undefined reference to `mbstowcs_chk' after
    464d189b9622932a75302290625de84931656ec0
  [29304] libc: mq_timedreceive does not handle 64 bit syscall return
    correct for !__ASSUME_TIME64_SYSCALLS
  [29403] libc: st_atim, st_mtim, st_ctim stat struct members are
    missing on microblaze with largefile

Release Notes
=============

https://sourceware.org/glibc/wiki/Release/2.36

Contributors
============

This release was made possible by the contributions of many people.
The maintainers are grateful to everyone who has contributed
changes or bug reports.  These include:

=Joshua Kinard
Adhemerval Zanella
Adhemerval Zanella Netto
Alan Modra
Andreas Schwab
Arjun Shankar
Arnout Vandecappelle (Essensium/Mind)
Carlos O'Donell
Cristian Rodríguez
DJ Delorie
Danila Kutenin
Darius Rad
Dmitriy Fedchenko
Dmitry V. Levin
Emil Soleyman-Zomalan
Fangrui Song
Florian Weimer
Gleb Fotengauer-Malinovskiy
Guilherme Janczak
H.J. Lu
Ilyahoo Proshel
Jason A. Donenfeld
Joan Bruguera
John David Anglin
Jonathan Wakely
Joseph Myers
José Bollo
Kito Cheng
Maciej W. Rozycki
Mark Wielaard
Matheus Castanho
Max Gautier
Michael Hudson-Doyle
Nicholas Guriev
Noah Goldstein
Paul E. Murphy
Raghuveer Devulapalli
Ricardo Bittencourt
Sam James
Samuel Thibault
Sergei Trofimovich
Siddhesh Poyarekar
Stafford Horne
Stefan Liebler
Steve Grubb
Su Lifan
Sunil K Pandey
Szabolcs Nagy
Tejas Belagod
Tom Coldrick
Tom Honermann
Tulio Magno Quites Machado Filho
WANG Xuerui
Wangyang Guo
Wilco Dijkstra
Xi Ruoyao
Xiaoming Ni
Yang Yanchao
caiyinyu

02 August, 2022 01:17AM by Carlos O'Donell

August 01, 2022

FSF Blogs

July GNU Spotlight with Amin Bandali: Nineteen new GNU releases!

July GNU Spotlight with Amin Bandali: Nineteen new GNU releases!

01 August, 2022 07:12PM

July 31, 2022

gcide @ Savannah

GCIDE version 0.53

GCIDE version 0.53 is available for download.

31 July, 2022 07:13AM by Sergey Poznyakoff

July 27, 2022

FSF News

FSF Blogs

Hackers of the world unite at HOPE 2022

FSF report of this year's Hackers on Planet Earth (HOPE) conference.

27 July, 2022 05:54PM

July 25, 2022

poke @ Savannah

GNU poke 2.4 released

I am happy to announce a new release of GNU poke, version 2.4.

This is a bugfix release in the poke 2.x series.

See the file NEWS in the distribution tarball for a list of issues fixed in this release.

The tarball poke-2.4.tar.gz is now available at
https://ftp.gnu.org/gnu/poke/poke-2.4.tar.gz.

> GNU poke (http://www.jemarch.net/poke) is an interactive, extensible editor for binary data.  Not limited to editing basic entities such as bits and bytes, it provides a full-fledged procedural, interactive programming language designed to describe data structures and to operate on them.


Happy poking!

--
Jose E. Marchesi
Frankfurt am Main
25 July 2022

25 July, 2022 02:21PM by Jose E. Marchesi

July 24, 2022

parallel @ Savannah

GNU Parallel 20220722 ('Roe vs Wade') released

GNU Parallel 20220722 ('Roe vs Wade') has been released. It is available for download at: lbry://@GnuParallel:4

Quote of the month:

  The syntax for GNU Parallel is so slick that I often use it just to make my script read nicer, and the parallelism is a cherry on top.
    -- Epistaxis@reddit

New in this release:

  • --colour-failed will color output red if the job fails.
  • sql: support for InfluxDB.
  • Polarhome.com is dead, so these OSs are no longer supported: AIX HPUX IRIX Minix OPENSTEP OpenIndiana OpenServer QNX Solaris Syllable Tru64 Ultrix UnixWare.
  • Bug fixes and man page updates.

News about GNU Parallel:

GNU Parallel - For people who live life in the parallel lane.

If you like GNU Parallel record a video testimonial: Say who you are, what you use GNU Parallel for, how it helps you, and what you like most about it. Include a command that uses GNU Parallel if you feel like it.

About GNU Parallel

GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.

If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.

GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.

For example you can run this to convert all jpeg files into png and gif files and have a progress bar:

  parallel --bar convert {1} {1.}.{2} ::: *.jpg ::: png gif

Or you can generate big, medium, and small thumbnails of all jpeg files in sub dirs:

  find . -name '*.jpg' |
    parallel convert -geometry {2} {1} {1//}/thumb{2}_{1/} :::: - ::: 50 100 200

You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/

You can install GNU Parallel in just 10 seconds with:

    $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
       fetch -o - http://pi.dk/3 ) > install.sh
    $ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
    12345678 883c667e 01eed62f 975ad28b 6d50e22a
    $ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
    cc21b4c9 43fd03e9 3ae1ae49 e28573c0
    $ sha512sum install.sh | grep ec113b49a54e705f86d51e784ebced224fdff3f52
    79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
    fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
    $ bash install.sh

Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.

When using programs that use GNU Parallel to process data for publication please cite:

O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.

If you like GNU Parallel:

  • Give a demo at your local user group/team/colleagues
  • Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
  • Get the merchandise https://gnuparallel.threadless.com/designs/gnu-parallel
  • Request or write a review for your favourite blog or magazine
  • Request or build a package for your favourite distribution (if it is not already there)
  • Invite me for your next conference

If you use programs that use GNU Parallel for research:

  • Please cite GNU Parallel in you publications (use --citation)

If GNU Parallel saves you money:

About GNU SQL

GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.

The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.

When using GNU SQL for a publication please cite:

O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.

About GNU Niceload

GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.

24 July, 2022 07:25AM by Ole Tange

July 23, 2022

datamash @ Savannah

GNU Datamash 1.8 released

This is to announce datamash-1.8, a new release.

Datamash is a command-line program which performs basic numeric, textual and
statistical operations on input textual data.


This is the first release for new maintainer Tim Rice, with much appreciation
to Shawn Wagner and Erik Auerswald for their help. See the AUTHORS and THANKS
files for additional credits and acknowledgements.


GNU Datamash home page:
   https://www.gnu.org/software/datamash/

Please report any problem you may experience to the bug-datamash@gnu.org
mailing list.

Happy Hacking!
- Tim Rice

==================================================================

Here are the compressed sources and a GPG detached signature[*]:
https://ftp.gnu.org/gnu/datamash/datamash-1.8.tar.gz
https://ftp.gnu.org/gnu/datamash/datamash-1.8.tar.gz.sig

Use a mirror for higher download bandwidth:
https://ftpmirror.gnu.org/datamash/datamash-1.8.tar.gz
https://ftpmirror.gnu.org/datamash/datamash-1.8.tar.gz.sig

[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  For instructions about how to do this, please
refer to https://ftp.gnu.org/README.  (In particular you will need to
retrieve the GNU keyring rather than using any keyservers.)

==================================================================

The checksums of the archive are:

$ sha1sum datamash-1.8.tar.gz
e77e15ed2c6b17b4045251fd87f16430c3bf2166  datamash-1.8.tar.gz

$ sha256sum datamash-1.8.tar.gz
94a4e11819ad259aa3745b7eca392e385e3a676d276e8cbb616269dbbb17fe6d  datamash-1.8.tar.gz

$ b2sum datamash-1.8.tar.gz
dfe4060ea65ea46a1796e01463fd9b0e55c2d633d06da153f585a3a569acf3e9211a14cb3905daf8ecae347358daa04db940d557b909f0ce5ebbba2f57d3a410  datamash-1.8.tar.gz

==================================================================

NEWS

  • Noteworthy changes in release 1.8 (2022-07-23) [stable]
    • Changes in Behavior

  Schedule -f/--full combined with non-linewise operations for deprecation.
  In a future release, -f/--full will only be usable with operations where
  it makes sense. For now, we print a warning to stderr when -f/--full is
  used with non-linewise operations, and such usage will no longer be
  supported.

  The bin operation now uses more intuitive bins. Previously, a command
  such as `datamash bin 1 <<< -0` would output -100; and -100 did not fall
  in its own bin. We now require all bins to take the form `[nx,(n+1)x)`
  with integer n and bin width x. We discard the sign on -0 and gate such
  inputs into the [0,x) bin.

  Operations taking more than one argument now provide more complete output
  with --header-out. Previously, an operation such as `pcov x:y` would
  produce an output header like `pcov(y)`, discarding the `x`. The new
  behavior will output header `pcov(x,y)`.

  datamash(1) no longer ignores --output-delimiter with the rmdup operation.

    • New Features

  New datamash option --sort-cmd argument to specify the program used
  by the -s option to sort input, plus enhancements to the security and
  portability of building sort command lines.

  New datamash option -c/--collapse-delimiter=X argument uses character
  X instead of comma between values in collapse and unique lists.

  New datamash operations: mean square (ms) and root mean square (rms).

  Decorate now supports sorting IP addresses of both versions 4 and 6
  together. IPv4 addresses are logically converted to IPv6 addresses,
  either as IPv4-Mapped (ipv6v4map) or IPv4-Compatible (ipv6v4comp)
  addresses.

  Add two command aliases:
    'echo' may now be used instead of 'cut'.
    'uniq' may now be used instead of 'unique'.

    • Improvements

  Updated the bash completion script to reflect recent additions.

    • Bug Fixes

  Datamash now passes the -z/--zero-terminated flag to the sort(1) child
  process when used with "--sort --zero-terminated". Additionally,
  if the system's sort(1) does not support -z, datamash reports the error
  and exits. Previously it would omit the "-z" when running sort(1),
  resulting in incorrect results.

  Documentation fixes and spelling corrections.

  Incorrect format in a decorate(1) error breaking compilation on some
  systems.

  datamash(1), decorate(1): Fix some minor memory leaks.

  datamash(1) no longer crashes when the unique or countunique operations
  are used with input data containing NUL bytes.  The problem was reported
  in https://lists.gnu.org/archive/html/bug-datamash/2020-11/msg00001.html
  by Catalin Patulea.

  datamash(1) no longer crashes when crosstab with --header-in is called
  by field name instead of index. I.e. `datamash --header-in ct x,y` now
  works as expected.

23 July, 2022 03:21AM by Tim Rice

July 22, 2022

FSF Blogs

FSD meeting recap 2022-07-22

Check out the great work our volunteers accomplished at today's Free Software Directory (FSD) IRC meeting.

22 July, 2022 10:30PM

July 21, 2022

gnuastro @ Savannah

Gnuastro 0.18 released

The 18th release of GNU Astronomy Utilities (Gnuastro) is now available. See the full announcement for all the new features in this release and the many bugs that have been found and fixed: https://lists.gnu.org/archive/html/info-gnuastro/2022-07/msg00001.html

21 July, 2022 11:17PM by Mohammad Akhlaghi

FSF Blogs

librejs @ Savannah

LibreJS 7.21.0 released

GNU LibreJS aims to address the JavaScript problem described in Richard
Stallman's article The JavaScript Trap. LibreJS is a free add-on for GNU IceCat and other Mozilla-based browsers. It blocks nonfree nontrivial JavaScript while allowing JavaScript that is free and/or trivial.

The user manual pages are at
<https://www.gnu.org/software/librejs/manual/>.

Source tarballs and signed xpis are available at
<https://ftp.gnu.org/gnu/librejs/>.

The binary can also be found at the Firefox Browser Addon website:
<https://addons.mozilla.org/en-US/firefox/addon/librejs/>.

GPG key ID: EF86DFD0
Fingerprint: 47F9 D050 1E11 8879 9040  4941 2126 7E93 EF86 DFD0

See also
<https://savannah.gnu.org/project/memberlist-gpgkeys.php?group=librejs>.

User-visible changes since 7.20.3

New in 7.21.0

21 July, 2022 01:52PM by Yuchen Pei

July 20, 2022

Andy Wingo

unintentional concurrency

Good evening, gentle hackfolk. Last time we talked about heuristics for when you might want to compact a heap. Compacting garbage collection is nice and tidy and appeals to our orderly instincts, and it enables heap shrinking and reallocation of pages to large object spaces and it can reduce fragmentation: all very good things. But evacuation is more expensive than just marking objects in place, and so a production garbage collector will usually just mark objects in place, and only compact or evacuate when needed.

Today's post is more details!

dedication

Just because it's been, oh, a couple decades, I would like to reintroduce a term I learned from Marnanel years ago on advogato, a nerdy group blog kind of a site. As I recall, there is a word that originates in the Oxbridge social environment, "narg", from "Not A Real Gentleman", and which therefore denotes things that not-real-gentlemen do: nerd out about anything that's not, like, fox-hunting or golf; or generally spending time on something not because it will advance you in conventional hierarchies but because you just can't help it, because you love it, because it is just your thing. Anyway, in the spirit of pursuits that are really not What One Does With One's Time, this post is dedicated to the word "nargery".

side note, bis: immix-style evacuation versus mark-compact

In my last post I described Immix-style evacuation, and noted that it might take a few cycles to fully compact the heap, and that it has a few pathologies: the heap might never reach full compaction, and that Immix might run out of free blocks in which to evacuate.

With these disadvantages, why bother? Why not just do a single mark-compact pass and be done? I implicitly asked this question last time but didn't really answer it.

For some people will be, yep, yebo, mark-compact is the right answer. And yet, there are a few reasons that one might choose to evacuate a fraction of the heap instead of compacting it all at once.

The first reason is object pinning. Mark-compact systems assume that all objects can be moved; you can't usefully relax this assumption. Most algorithms "slide" objects down to lower addresses, squeezing out the holes, and therefore every live object's address needs to be available to use when sliding down other objects with higher addresses. And yet, it would be nice sometimes to prevent an object from being moved. This is the case, for example, when you grant a foreign interface (e.g. a C function) access to a buffer: if garbage collection happens while in that foreign interface, it would be nice to be able to prevent garbage collection from moving the object out from under the C function's feet.

Another reason to want to pin an object is because of conservative root-finding. Guile currently uses the Boehm-Demers-Weiser collector, which conservatively scans the stack and data segments for anything that looks like a pointer to the heap. The garbage collector can't update such a global root in response to compaction, because you can't be sure that a given word is a pointer and not just an integer with an inconvenient value. In short, objects referenced by conservative roots need to be pinned. I would like to support precise roots at some point but part of my interest in Immix is to allow Guile to move to a better GC algorithm, without necessarily requiring precise enumeration of GC roots. Optimistic partial evacuation allows for the possibility that any given evacuation might fail, which makes it appropriate for conservative root-finding.

Finally, as moving objects has a cost, it's reasonable to want to only incur that cost for the part of the heap that needs it. In any given heap, there will likely be some data that stays live across a series of collections, and which, once compacted, can't be profitably moved for many cycles. Focussing evacuation on only the part of the heap with the lowest survival rates avoids wasting time on copies that don't result in additional compaction.

(I should admit one thing: sliding mark-compact compaction preserves allocation order, whereas evacuation does not. The memory layout of sliding compaction is more optimal than evacuation.)

multi-cycle evacuation

Say a mutator runs out of memory, and therefore invokes the collector. The collector decides for whatever reason that we should evacuate at least part of the heap instead of marking in place. How much of the heap can we evacuate? The answer depends primarily on how many free blocks you have reserved for evacuation. These are known-empty blocks that haven't been allocated into by the last cycle. If you don't have any, you can't evacuate! So probably you should keep some around, even when performing in-place collections. The Immix papers suggest 2% and that works for me too.

Then you evacuate some blocks. Hopefully the result is that after this collection cycle, you have more free blocks. But you haven't compacted the heap, at least probably not on the first try: not into 2% of total space. Therefore you tell the mutator to put any empty blocks it finds as a result of lazy sweeping during the next cycle onto the evacuation target list, and then the next cycle you have more blocks to evacuate into, and more and more and so on until after some number of cycles you fall below some overall heap fragmentation low-watermark target, at which point you can switch back to marking in place.

I don't know how this works in practice! In my test setups which triggers compaction at 10% fragmentation and continues until it drops below 5%, it's rare that it takes more than 3 cycles of evacuation until the heap drops to effectively 0% fragmentation. Of course I had to introduce fragmented allocation patterns into the microbenchmarks to even cause evacuation to happen at all. I look forward to some day soon testing with real applications.

concurrency

Just as a terminological note, in the world of garbage collectors, "parallel" refers to multiple threads being used by a garbage collector. Parallelism within a collector is essentially an implementation detail; when the world is stopped for collection, the mutator (the user program) generally doesn't care if the collector uses 1 thread or 15. On the other hand, "concurrent" means the collector and the mutator running at the same time.

Different parts of the collector can be concurrent with the mutator: for example, sweeping, marking, or evacuation. Concurrent sweeping is just a detail, because it just visits dead objects. Concurrent marking is interesting, because it can significantly reduce stop-the-world pauses by performing most of the computation while the mutator is running. It's tricky, as you might imagine; the collector traverses the object graph while the mutator is, you know, mutating it. But there are standard techniques to make this work. Concurrent evacuation is a nightmare. It's not that you can't implement it; you can. But it's very very hard to get an overall performance win from concurrent evacuation/copying.

So if you are looking for a good bargain in the marketplace of garbage collector algorithms, it would seem that you need to avoid concurrent copying/evacuation. It's an expensive product that would seem to not buy you very much.

All that is just a prelude to an observation that there is a funny source of concurrency even in some systems that don't see themselves as concurrent: mutator threads marking their own roots. To recall, when you stop the world for a garbage collection, all mutator threads have to somehow notice the request to stop, reach a safepoint, and then stop. Then the collector traces the roots from all mutators and everything they reference, transitively. Then you let the threads go again. Thing is, once you get more than a thread or four, stopping threads can take time. You'd be tempted to just have threads notice that they need to stop, then traverse their own stacks at their own safepoint to find their roots, then stop. But, this introduces concurrency between root-tracing and other mutators that might not have seen the request to stop. For marking, this concurrency can be fine: you are just setting mark bits, not mutating the roots. You might need to add an additional mark pattern that can be distinguished from marked-last-time and marked-the-time-before-but-dead-now, but that's a detail. Fine.

But if you instead start an evacuating collection, the gates of hell open wide and toothy maws and horns fill your vision. One thread could be stopping and evacuating the objects referenced by its roots, while another hasn't noticed the request to stop and is happily using the same objects: chaos! You are trying to make a minor optimization to move some work out of the stop-the-world phase but instead everything falls apart.

Anyway, this whole article was really to get here and note that you can't do ragged-stops with evacuation without supporting full concurrent evacuation. Otherwise, you need to postpone root traversal until all threads are stopped. Perhaps this is another argument that evacuation is expensive, relative to marking in place. In practice I haven't seen the ragged-stop effect making so much of a difference, but perhaps that is because evacuation is infrequent in my test cases.

Zokay? Zokay. Welp, this evening's nargery was indeed nargy. Happy hacking to all collectors out there, and until next time.

20 July, 2022 09:26PM by Andy Wingo

Parabola GNU/Linux-libre

elogind requires manual intervention

As part of the init scripts repackaging, elogind does no longer ship with its OpenRC init script. You have to manually install it when upgrading:

# pacman -Syu
# pacman -S elogind-openrc

20 July, 2022 03:29AM by David P.

July 19, 2022

[nonsystemd] NetworkManager, dbus and display managers require manual intervention

We have recently began a repackaging of [nonsystemd] packages (see #3290). The displaymanager-openrc package has been removed and specific init scripts have been added for their respective display manager (e.g. sddm-openrc for sddm, gdm-openrc for gdm and so on with lxdm, xdm and lightdm)

Regarding NetworkManager and dbus, their nonsystemd builds used to ship with their corresponding OpenRC init scripts, but now they were separated into networkmanager-openrc and dbus-openrc. Please install these when upgrading those packages.

19 July, 2022 03:06AM by David P.

July 18, 2022

health @ Savannah

GNU Health project to implement REUSE compliance

The REUSE initiative[1] is a Free Software Foundation Europe program that facilitates the documentation of licenses of Libre projects like GNU Health.

After several meetings with our friends from FSFE, we have decided to implement REUSE in all GNUHealth components, that is:

  • Hospital Management System
  • MyGNUHealth Personal Health Record
  • Thalamus
  • GH Federation Portal

We believe that for large projects like GNUHealth, with multiple files of different kinds (code, graphics, data, ..) REUSE will be a great companion.

1.- https://reuse.software/

18 July, 2022 08:15AM by Luis Falcon

July 17, 2022

a2ps @ Savannah

a2ps 4.14.90 released [alpha]

I’m delighted to announce a new alpha release of GNU a2ps. This release involves minimal changes to functionality, but involves a considerable update to the build system and code cleanup and simplification, as well as bug fixes. See below for more details.

I am particularly keen to hear from users and packagers about this release, as I plan to make a stable 4.15 release soon.

* Noteworthy changes in release 4.14.90 (2022-07-17) [alpha]
 * This is an alpha release, owing to the considerable changes since the
   last version.
 * New maintainer, Reuben Thomas.
 * Build:
   - Updated and fixed the build system, using gnulib and modern Autotools.
   - Remove OS/2 support.
   - Require libpaper.
 * Predefined delegations:
   - Remove support for defunct Netscape and proprietary Acrobat Reader.
   - Add lpr wrapper for automatic detection of different printing systems,
     including CUPS support.
 * Encodings:
   - Use libre fonts for KOI-8.
   - Composite fonts support.
 * Documentation:
   - Some English fixes.
   - Man page for fixnt.
 * Bug fixes:
   - Fixes for security bugs CVE-2001-1593, CVE-2015-8107 and CVE-2014-0466.
   - Minor bugs fixed.

17 July, 2022 07:19PM by Reuben Thomas

July 16, 2022

rush @ Savannah

GNU rush version 2.3

GNU rush version 2.3 is available for download.  This is a bug-fixing release.

16 July, 2022 05:20PM by Sergey Poznyakoff

July 15, 2022

FSF Blogs

FSD meeting recap 2022-07-15

Check out the great work our volunteers accomplished at today's Free Software Directory (FSD) IRC meeting.

15 July, 2022 09:05PM

July 14, 2022

Parabola GNU/Linux-libre

[From Arch]: wxgtk2 may require manual intervention

if pacman gives an error message like:

error: failed to prepare transaction (could not satisfy dependencies) :: removing wxgtk-common breaks dependency 'wxgtk-common' required by wxgtk2

you will need to uninstall 'wxgtk2' and it's dependents first (the only such parabola packages are 'freefilesync' and 'odamex')

# pacman -Rc wxgtk2

14 July, 2022 10:08PM by bill auger

Simon Josefsson

Towards pluggable GSS-API modules

GSS-API is a standardized framework that is used by applications to, primarily, support Kerberos V5 authentication. GSS-API is standardized by IETF and supported by protocols like SSH, SMTP, IMAP and HTTP, and implemented by software projects such as OpenSSH, Exim, Dovecot and Apache httpd (via mod_auth_gssapi). The implementations of Kerberos V5 and GSS-API that are packaged for common GNU/Linux distributions, such as Debian, include MIT Kerberos, Heimdal and (less popular) GNU Shishi/GSS.

When an application or library is packaged for a GNU/Linux distribution, a choice is made which GSS-API library to link with. I believe this leads to two problematic consequences: 1) it is difficult for end-users to chose between Kerberos implementation, and 2) dependency bloat for non-Kerberos users. Let’s discuss these separately.

  1. No system admin or end-user choice over the GSS-API/Kerberos implementation used

    There are differences in the bug/feature set of MIT Kerberos and that of Heimdal’s, and definitely that of GNU Shishi. This can lead to a situation where an application (say, Curl) is linked to MIT Kerberos, and someone discovers a Kerberos related problem that would have been working if Heimdal was used, or vice versa. Sometimes it is possible to locally rebuild a package using another set of dependencies. However doing so has a high maintenance cost to track security fixes in future releases. It is an unsatisfying solution for the distribution to flip flop between which library to link to, depending on which users complain the most. To resolve this, a package could be built in two variants: one for MIT Kerberos and one for Heimdal. Both can be shipped. This can help solve the problem, but the question of which variant to install by default leads to similar concerns, and will also eventually leads to dependency conflicts. Consider an application linked to libraries (possible in several steps) where one library only supports MIT Kerberos and one library only supports Heimdal.

    The fact remains that there will continue to be multiple Kerberos implementations. Distributions will continue to support them, and will be faced with the dilemma of which one to link to by default. Distributions and the people who package software will have little guidance on which implementation to chose from their upstream, since most upstream support both implementations. The result is that system administrators and end-users are not given a simple way to have flexibility about which implementation to use.
  2. Dependency bloat for non-Kerberos use-cases.

    Compared to the number of users of GNU/Linux systems out there, the number of Kerberos users on GNU/Linux systems is smaller. Here distributions face another dilemma. Should they enable GSS-API for all applications, to satisfy the Kerberos community, or should they be conservative with adding dependencies to reduce attacker surface for the non-Kerberos users? This is a dilemma with no clear answer, and one approach has been to ship two versions of a package: one with Kerberos support and one without. Another option here is for upstream to support loadable modules, for example Dovecot implement this and Debian ship with a separate ‘dovecot-gssapi’ package that extend the core Dovecot seamlessly. Few except some larger projects appear to be willing to carry that maintenance cost upstream, so most only support build-time linking of the GSS-API library.

    There are a number of real-world situations to consider, but perhaps the easiest one to understand for most GNU/Linux users is OpenSSH. The SSH protocol supports Kerberos via GSS-API, and OpenSSH implement this feature, and most GNU/Linux distributions ship a SSH client and SSH server linked to a GSS-API library. Someone made the choice of linking it to a GSS-API library, for the arguable smaller set of people interested in it, and also the choice which library to link to. Rebuilding OpenSSH locally without Kerberos support comes with a high maintenance cost. Many people will not need or use the Kerberos features of the SSH client or SSH server, and having it enabled by default comes with a security cost. Having a vulnerability in OpenSSH is critical for many systems, and therefor its dependencies are a reasonable concern. Wouldn’t it be nice if OpenSSH was built in a way that didn’t force you to install MIT Kerberos or Heimdal? While still making it easy for Kerberos users to use it, of course.

Hopefully I have made the problem statement clear above, and that I managed to convince you that the state of affairs is in need of improving. I learned of the problems from my personal experience with maintaining GNU SASL in Debian, and for many years I ignored this problem.

Let me introduce Libgssglue!

Matryoshka Dolls
Matryoshka Dolls – photo CC-4.0-BY-NC by PngAll

Libgssglue is a library written by Kevin W. Coffman based on historical GSS-API code, the initial release was in 2004 (using the name libgssapi) and the last release was in 2012. Libgssglue provides a minimal GSS-API library and header file, so that any application can link to it instead of directly to MIT Kerberos or Heimdal (or GNU GSS). The administrator or end-user can select during run-time which GSS-API library to use, through a global /etc/gssapi_mech.conf file or even a local GSSAPI_MECH_CONF environment variable. Libgssglue is written in C, has no external dependencies, and is BSD-style licensed. It was developed for the CITI NFSv4 project but libgssglue ended up not being used.

I have added support to build GNU SASL with libgssglue — the changes required were only ./configure.ac-related since GSS-API is a standardized framework. I have written a fairly involved CI/CD check that builds GNU SASL with MIT Kerberos, Heimdal, libgssglue and GNU GSS, sets ups a local Kerberos KDC and verify successful GSS-API and GS2-KRB5 authentications. The ‘gsasl’ command line tool connects to a local example SMTP server, also based on GNU SASL (linked to all variants of GSS-API libraries), and to a system-installed Dovecot IMAP server that use the MIT Kerberos GSS-API library. This is on Debian but I expect it to be easily adaptable to other GNU/Linux distributions. The check triggered some (expected) Shishi/GSS-related missing features, and triggered one problem related to authorization identities that may be a bug in GNU SASL. However, testing shows that it is possible to link GNU SASL with libgssglue and have it be operational with any choice of GSS-API library that is shipped with Debian. See GitLab CI/CD code and its CI/CD output.

This experiment worked so well that I contacted Kevin to learn that he didn’t have any future plans for the project. I have adopted libgssglue and put up a Libgssglue GitLab project page, and pushed out a libgssglue 0.5 release fixing only some minor build-related issues. There are still some missing newly introduced GSS-API interfaces that could be added, but I haven’t been able to find any critical issues with it. Amazing that an untouched 10 year old project works so well!

My current next steps are:

  • Release GNU SASL with support for Libgssglue and encourage its use in documentation.
  • Make GNU SASL link to Libgssglue in Debian, to avoid a hard dependency on MIT Kerberos, but still allowing a default out-of-the-box Kerberos experience with GNU SASL.
  • Maintain libgssglue upstream and implement self-checks, CI/CD testing, new GSS-API interfaces that have been defined, and generally fix bugs and improve the project. Help appreciated!
  • Maintain the libgssglue package in Debian.
  • Look into if there are applications in Debian that link to a GSS-API library that could instead be linked to libgssglue to allow flexibility for the system administrator and reduce dependency bloat.

What do you think? Happy Hacking!

14 July, 2022 09:28AM by simon

July 13, 2022

Christine Lemmer-Webber

The Beginning of A Grueling Diet

Today I made a couple of posts on the social medias, I will repost them in their entirety:

FUCK eating "DELICIOUS" food. I'm DONE eating delicious food.

What has delicious food ever done for me? Only made me want to eat more of it.

From now on I am only eating BORING-ASS GRUELS that I can eat as much as I want of which will not be much because they are BORING -- fediverse post birdsite post

And then:

I am making a commitment: I will be eating nothing but boring GRUELS until the end of 2022.

Hold me to it. fediverse post birdsite post

I am hereby committing to "a grueling diet" for the rest of 2022, as an experiment. Here are the rules:

  • Gruel, Pottage, and soup, in unlimited quantities. But gruel is preferred.
  • Fresh, steamed, pickled, and roasted fruit, vegetables, and tofu may be eaten in unlimited quantity. Unsweetened baking chocolate, cottage cheese are also permitted.
  • Tea, seltzer water, milk (any kind), and coffee are fine to have.
  • Pottage (including gruel) may be adorned as deemed appropriate, but not too luxuriously. Jams and molasses may be had for a nice breakfast or dessert, but not too much. Generally, only adorn pottages at mealtime; pottages in-between mealtime should be eaten with as sparing of additions as possible.
  • Not meaning to be rude to visitors, guests, hosts, and other such good company, exceptions to the diet may be made when visiting, being visited, or going on dates.

I will provide followups to this post throughout the year, justifying the diet, describing how it has affected my health, putting it in historical context, providing recipes, etc.

In the meanwhile, to the rest of 2022: may that it be grueling indeed!

Edit (2022-07-15): Added tofu to the list of acceptible things to additionally eat. Clarified gruel/pottage adornment advice. Added roasting as an acceptable processing method for fruit/vegetables/tofu.

13 July, 2022 12:30PM by Christine Lemmer-Webber (cwebber@dustycloud.org)

July 09, 2022

GNUnet News

GNUnet 0.17.2

GNUnet 0.17.2

This is a bugfix release for gnunet 0.17.1.

Download links

The GPG key used to sign is: 3D11063C10F98D14BD24D1470B0998EF86F59B6A

Note that due to mirror synchronization, not all links may be functional early after the release. For direct access try http://ftp.gnu.org/gnu/gnunet/

Noteworthy changes in 0.17.1 (since 0.17.2)

  • DHT : Various bugfixes in the protocol.
  • RECLAIM : OpenID Connect plugin improvements.
  • ABE : Removed.

A detailed list of changes can be found in the ChangeLog and the bugtracker .

09 July, 2022 10:00PM

Christine Lemmer-Webber

Guile Steel: a proposal for a systems lisp

Before we get into this kind of stream-of-consciousness outline, I'd like to note that very topically to this, over at the Spritely Institute (where I'm CTO, did I mention on here yet that I'm the CTO of a nonprofit to improve networked communication on the internet on this blog? because I don't think I did) we published a Scheme Primer, and the feedback to it has been just lovely. This post isn't a Spritely Institute thing (at least, not yet, though if its ideas manifested it could be possible we might use some of the tech), but since it's about Scheme, I thought I'd mention that.

This blogpost outlines something I've had kicking around in my head for a while: the desire for a modern "systems lisp", you know, kind of like Rust, except hopefully much better than Rust, and in Lisp. (And, if it turns out to be for not other reason, it might simply be better by being written in a Lisp.) But let's be clear: I haven't written anything, this blogpost is a ramble, it's just kind of a set of feelings about what I'd like, what I think is possible.

Let's open by saying that there's no real definition of what a "systems language" is... but more or less what people mean is, "something like C". In other words, what people nowadays consider a low-level language, even though C used to be considered a high level language. And what people really mean is: it's fast, it's statically typed, and it's really for the bit-fiddling types of speed demons out there.

Actually, let's put down a few asides for a moment. People have conflated two different benefits fo "statically typed" languages because they've mostly been seen together:

  • Static typing for ahead-of-time more-correct programs
  • Static typing for faster or leaner programs (which subdivides in terms of memory and CPU benefits, more or less)

In the recent FOSS & Crafts episode What is Lisp? we talk a bit about how the assumptions that dynamically typed languages are "slow" is really due to lack of hardware support, and that lisp machines actually had hardware support directly (tagged memory architecture and hardware garbage collection) and even wrote low-level parts of their systems like the "graphics drivers" directly in lisp, and it was plenty fast, and that it would even be possible to have co-processors on which dynamic code (not just lisp) ran at "native speed" (this is what the MacIvory did), but this is all somewhat of an aside because that's not the world we live in. So as much as I, Christine, would love to have tagged architecture (co-)processors, they probably won't happen, except there's some RISC-V tagged architecture things but I don't think they've gotten very far and they seem mostly motivated by a security model that doesn't make any sense to me. But I'd love to be wrong on this! I would like tagged RISC-V to succeed! But still, there's the problem of memory management, and I don't think anyone's been working on a hardware garbage collector or if that would really be a better thing anyway.

The fact is, there's been a reinforcing effect over the last several decades since the death of the lisp machine: CPUs are optimized for C, and C is optimized for CPUs, and both of them try to optimize for each other. So "systems programming" really means "something like C" because that's what our CPUs like because that's what our languages like and these are pretty much re-inforcing.

And besides, C is basically the lingua franca of programming languages, right? If you want to make something widely portable, you target the C ABI, because pretty much all programming languages have some sort of C FFI toolkit thing or just make C bindings, and everyone is happy. Except, oh wait, C doesn't actually have an ABI! Well, okay, I guess not, but it doesn't matter because the C ABI triples, that's what the world works with.

Well also, you gotta target the web, right? And actually the story there is a bit nicer because WebAssembly is actually kinda awesome, and the hope and dream is that all programming languages in some way or another target WebAssembly, and then "you gotta write your thing in Javascript because it's the language of the web!!!" is no longer a thing I have to hear anymore. (Yes, all my friends who work on Javascript, I appreciate you for making it the one programming language which has mostly gotten better over time... hopefully it stays that way, and best of luck.) But the point is, any interesting programming language these days should be targeting Webassembly, and hopefully not just via Emscripten, but hopefully via actually targeting Webassembly directly.

So okay, we have at least two targets for our "system language": C, or something that is C-compatible, and Webassembly. And static type analysis in terms of preventing errors, that's also a useful thing, I won't deny it. (I think the division of "statically typed" and "dynamically typed" languages is probably more of a false one than we tend to think, but that's a future blogpost, to be written.) And these days, it's also how you get speed while also being maximally bit-twiddly fast, because that's how our machines (including the abstract one in Webassembly) are designed. So okay, grumbling about conflating two things aside, let's run with that.

So anyway, I promised to write about this "Guile Steel" thing I've been musing about, and we've gotten this far in the article, and I haven't yet. So, this is, more than a concrete proposal, a call to arms to implement just such a systems language for Guile. I might make a prototype at some point, but you, dear reader, are free to take the idea of "Guile Steel" and run with it. In fact, please do.

So anyway. First, about the name. It's probably pretty obvious based on the name that I'm suggesting this be a language for Guile Scheme. And "Guile" as a name itself is both a continuation of the kind of playfully mischevious names in the Scheme family and its predecessors, but also a pun on co-founder of the Scheme language, Guy L. Steele. So "Guile Steele" kinda brings that pun home, and "Steel" sounds low-level, close to the metal.

But also, Guile has a lovely compiler tower. It would be nice to put some more lovely things on it! Why not a systems language?

There's some precedent here. The lovely Scheme 48's lowest levels of code (including its garbage collector) are written in an interesting language called PreScheme (more on PreScheme), which is something that's kind of like Scheme, but not really. It doesn't do automatic garbage collection itself, and I think Rust has shown that this area could be improved for a more modern PreScheme system. But you can hack on it at the REPL, and then it can compile to C, and it also has an implementation on Common Lisp, so you can bootstrap it a few different ways. PreScheme uses a Hindley-Milner type system; I suspect we can do even better with a propagator approach but that's untested. Anyway, starting by porting PreScheme from Scheme48 to Guile directly would be a good way to get going.

Guile also has some pretty good reasons to want something like this. For one thing, if you're a Guile person, then by gosh you're probably a Guix person. And Rust, it's real popular these days, and for good reasons, we're all better of with less memory vulnerabilities in our lives, but you know... it's kind of a pain, packaging wise, I hear? Actually I've never tried packaging anything in Rust but Efraim certainly has and when your presentation starts with the slide "Packaging Rust crates in GNU Guix: How hard could it possibly be?" I guess the answer is going to be that it's a bit of a headache. So maybe it's not the end of the world, but I think it might be nice if on that ground we had our own alternative, but that's just a minor thing.

And I don't think there's anything wrong with Rust, but I'd love to see... can we do better? I feel like it could be hackable, accessible, and it also could, probably, be a lot of fun? That's a good reason, I know I'd like something like this myself, I'd like to play with it, I'd like to be able to use it.

But maybe also... well, let's not beat around the bush, a whole lot of Guile is written in C, and our dear wonderful Andy Wingo has done a lot of lovely things to make us less dependent on C, some half-straps and some baseline compilers and just rewriting a lot of stuff in Scheme and so on and so forth but it would be nice if we had something we could officially rally around as "hey this is the thing we're going to start rewriting things in", because you know, C really is kind of a hard world to trust, and I'd like the programming language environment I rely on to not be so heavily built on it.

And at this point in the article, I have to say that Xerz! pointed out that there is a thing called Carp which is indeed a lisp that compiles to C and you know what, I'm pretty embarassed for having not paid attention to it... I certainly saw it linked at one point but didn't pay enough attention, and... maybe it needs a closer look. Heck, it's written in Haskell, which is a pretty cool choice.

But hey, the Guile community still deserves a thing of its own, right? What do we have that compiler tower for if we're not going to add some cool things to it? And... gosh, I'd really like to get Guile in the browser, and there are some various paths, and Wingo gave a fun presentation on compiling to Webassembly last year, but wouldn't it be nice if just our whole language stack was written in something designed to compile to either something C-like or... something?

I might do some weekend fiddling towards this direction, but sadly this can't be my main project. As a call to arms, maybe it inspires someone to take it up as theirs though. I will say that if you work on it, I promise to spend some time using whatever you build and trying it out and sending patches. So that's it, that's my stream-of-consciousness post on Guile Steel: currently an idea... maybe eventually a reality?

09 July, 2022 11:26AM by Christine Lemmer-Webber (cwebber@dustycloud.org)

July 05, 2022

Site converted to Haunt

Lo and behold, I've converted the last of the sites I've been managing for ages to Haunt.

Haunt isn't well known. Apparently I am responsible for, er, many of the sites listed on awesome.haunt.page. But you know what? I've been making website things for a long time, and Haunt is honestly the only static site generator I've worked with (and I've worked with quite a few) that's actually truly customizable and programmable and pleasant to work with. And hackable!

This site has seen quite a few iterations... some custom code when I first launched it some time ago, then I used Zine, then I used PyBlosxom, and for quite a few years everything was running on Pelican. But I never liked hacking on any of those... I always kind of begrudgingly opened up the codebase and regretted having to change anything. But Haunt? Haunt's a dream, it's all there and ready for you, and I've even gotten some patches upstream. (Actually I owe Dave a few more, heh.)

Everything is Scheme in Haunt, which means, for instance, that this page needed an archive page for ages that actually worked and was sensible and I just didn't ever feel like doing it. But in Haunt, it's just delicious Guile flavored Scheme:

(define (archive-tmpl site posts)
  ;; build a map of (year -> posts)
  (define posts-by-year
    (let ((ht (make-hash-table)))      ; hash table we're building up
      (do ((posts posts (cdr posts)))  ; iterate over all posts
          ((null? posts) ht)           ; until we're out of posts
        (let* ((post (car posts))                   ; put this post in year bucket
               (year (date-year (post-date post)))
               (year-entries (hash-ref ht year '())))
          (hash-set! ht year (cons post year-entries))))))
  ;; sort all the years
  (define sorted-years
    (sort (hash-map->list (lambda (k v) k) posts-by-year) >))
  ;; rendering for one year
  (define (year-content year)
    `(div (@ (style "margin-bottom: 10px;"))
          (h3 ,year)
          (ul ,@(map post-content
                     (posts/reverse-chronological
                      (hash-ref posts-by-year year))))))
  ;; rendering for one post within a year
  (define (post-content post)
    `(li
      (a (@ (href ,(post-uri site post)))
         ,(post-ref post 'title))))
  ;; the whole page
  (define content
    `(div (@ (class "entry"))
          (h2 "Blog archive (by year)")
          (ul ,@(map year-content sorted-years))))
  ;; render within base template
  (base-tmpl site content))

Lambda, the ultimate static site generator!

At any rate, I expect some things are broken, to be fixed, etc. Let me know if you see 'em. Heck, you can browse the site contents should you be so curious!

But is there really anything more boring than a meta "updated my website code" post like this? Anyway, in the meanwhile I've corrected straggling instances of my deadname which were sitting around. The last post I made was me coming out as trans, and... well a lot has changed since then. So I guess I've got some more things to write. And also this whole theme... well I like some of it but I threw it together when I was but a wee web developer, back before CSS was actually nice to write, etc. So maybe I need to overhaul the look and feel too. And I always meant to put in that project directory, and ooh maybe an art gallery, and so on and so on...

But hey, I like updating my website again! So maybe I actually will!

05 July, 2022 01:20PM by Christine Lemmer-Webber (cwebber@dustycloud.org)

July 02, 2022

pspp @ Savannah

PSPP 1.6.2 has been released.

I'm very pleased to announce the release of a new version of GNU PSPP.  PSPP is a program for statistical analysis of sampled data.  It is a free replacement for the proprietary program SPSS.

Changes from 1.6.1 to 1.6.2:

  • Bug fixes.

Please send PSPP bug reports to bug-gnu-pspp@gnu.org.

02 July, 2022 04:08AM by Ben Pfaff

June 27, 2022

ghm @ Savannah

GHM 2022 confirmed

The next GNU Hackers' Meeting will take place in İzmir, Turkey, in October 2022.  Please see the event web page at https://gnu.org/ghm/2022 .

27 June, 2022 07:12PM by Luca Saiu

June 25, 2022

GNU Taler news

June 24, 2022

pspp @ Savannah

PSPP 1.6.1 has been released.

I'm very pleased to announce the release of a new version of GNU PSPP.  PSPP is a program for statistical analysis of sampled data.  It is a free replacement for the proprietary program SPSS.

Changes from 1.6.0 to 1.6.1:

  • The SET command now supports LEADZERO for controlling output of a leading zero in F, COMMA, and DOT format.
  • Bug fixes and translation updates.

Please send PSPP bug reports to bug-gnu-pspp@gnu.org.

24 June, 2022 04:57PM by Ben Pfaff

June 23, 2022

parallel @ Savannah

GNU Parallel 20220622 ('Bongbong') released

GNU Parallel 20220622 ('Bongbong') has been released. It is available for download at: http://ftpmirror.gnu.org/parallel/

Quote of the month:

  Parallel has been (and still is) super useful and simple tool for speeding up all kinds of shell tasks during my career.
    -- ValtteriL@ycombinator

New in this release:

  • , can be used in --sshlogin if quoted as \, or ,,
  • --plus {/#regexp/str} replace ^regexp with str.
  • --plus {/%regexp/str} replace regexp$ with str.
  • --plus {//regexp/str} replace every regexp with str.
  • 'make install' installs bash+zsh completion files.
  • Bug fixes and man page updates.

GNU Parallel - For people who live life in the parallel lane.

If you like GNU Parallel record a video testimonial: Say who you are, what you use GNU Parallel for, how it helps you, and what you like most about it. Include a command that uses GNU Parallel if you feel like it.

About GNU Parallel

GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.

If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.

GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.

For example you can run this to convert all jpeg files into png and gif files and have a progress bar:

  parallel --bar convert {1} {1.}.{2} ::: *.jpg ::: png gif

Or you can generate big, medium, and small thumbnails of all jpeg files in sub dirs:

  find . -name '*.jpg' |
    parallel convert -geometry {2} {1} {1//}/thumb{2}_{1/} :::: - ::: 50 100 200

You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/

You can install GNU Parallel in just 10 seconds with:

    $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
       fetch -o - http://pi.dk/3 ) > install.sh
    $ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
    12345678 883c667e 01eed62f 975ad28b 6d50e22a
    $ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
    cc21b4c9 43fd03e9 3ae1ae49 e28573c0
    $ sha512sum install.sh | grep ec113b49a54e705f86d51e784ebced224fdff3f52
    79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
    fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
    $ bash install.sh

Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.

When using programs that use GNU Parallel to process data for publication please cite:

O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.

If you like GNU Parallel:

  • Give a demo at your local user group/team/colleagues
  • Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
  • Get the merchandise https://gnuparallel.threadless.com/designs/gnu-parallel
  • Request or write a review for your favourite blog or magazine
  • Request or build a package for your favourite distribution (if it is not already there)
  • Invite me for your next conference

If you use programs that use GNU Parallel for research:

  • Please cite GNU Parallel in you publications (use --citation)

If GNU Parallel saves you money:

About GNU SQL

GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.

The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.

When using GNU SQL for a publication please cite:

O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.

About GNU Niceload

GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.

23 June, 2022 08:29AM by Ole Tange

June 22, 2022

www @ Savannah

New Article by Richard Stallman

The GNU Education Team has published a new article by Richard Stallman on the threats of Big Tech in the field of education.

Many Governments Encourage Schools to Let Companies Snoop on Students

Human Rights Watch studied 164 software programs and web sites  recommended by various governments for schools to make students use.  It found that 146 of them gave data to advertising and tracking companies.

The researchers were thorough and checked for various snooping methods, including fingerprinting of devices to identify users. The targets of the investigation were not limited to programs and sites specifically “for education;” they included, for instance, Zoom and Microsoft Teams.

I expect that each program collected personal data for its developer. I'm not sure whether the results counted that, but they should. Once the developer company gets personal data, it can provide that data to advertising profilers, as well as to other companies and governments, and it can engage directly in manipulation of students and teachers.

The recommendations Human Rights Watch makes follow the usual approach of regulating the use of data once collected.  This is fundamentally inadequate; personal data, once collected, will surely be misused.

The only approach that makes it possible to end massive surveillance starts with demanding that the software be free. Then users will be able to modify the software to avoid giving real data to companies.

More at gnu/education...

22 June, 2022 01:42PM by Dora Scilipoti

June 21, 2022

Andy Wingo

an optimistic evacuation of my wordhoard

Good morning, mallocators. Last time we talked about how to split available memory between a block-structured main space and a large object space. Given a fixed heap size, making a new large object allocation will steal available pages from the block-structured space by finding empty blocks and temporarily returning them to the operating system.

Today I'd like to talk more about nothing, or rather, why might you want nothing rather than something. Given an Immix heap, why would you want it organized in such a way that live data is packed into some blocks, leaving other blocks completely free? How bad would it be if instead the live data were spread all over the heap? When might it be a good idea to try to compact the heap? Ideally we'd like to be able to translate the answers to these questions into heuristics that can inform the GC when compaction/evacuation would be a good idea.

lospace and the void

Let's start with one of the more obvious points: large object allocation. With a fixed-size heap, you can't allocate new large objects if you don't have empty blocks in your paged space (the Immix space, for example) that you can return to the OS. To obtain these free blocks, you have four options.

  1. You can continue lazy sweeping of recycled blocks, to see if you find an empty block. This is a bit time-consuming, though.

  2. Otherwise, you can trigger a regular non-moving GC, which might free up blocks in the Immix space but which is also likely to free up large objects, which would result in fresh empty blocks.

  3. You can trigger a compacting or evacuating collection. Immix can't actually compact the heap all in one go, so you would preferentially select evacuation-candidate blocks by choosing the blocks with the least live data (as measured at the last GC), hoping that little data will need to be evacuated.

  4. Finally, for environments in which the heap is growable, you could just grow the heap instead. In this case you would configure the system to target a heap size multiplier rather than a heap size, which would scale the heap to be e.g. twice the size of the live data, as measured at the last collection.

If you have a growable heap, I think you will rarely choose to compact rather than grow the heap: you will either collect or grow. Under constant allocation rate, the rate of empty blocks being reclaimed from freed lospace objects will be equal to the rate at which they are needed, so if collection doesn't produce any, then that means your live data set is increasing and so growing is a good option. Anyway let's put growable heaps aside, as heap-growth heuristics are a separate gnarly problem.

The question becomes, when should large object allocation force a compaction? Absent growable heaps, the answer is clear: when allocating a large object fails because there are no empty pages, but the statistics show that there is actually ample free memory. Good! We have one heuristic, and one with an optimum: you could compact in other situations but from the point of view of lospace, waiting until allocation failure is the most efficient.

shrinkage

Moving on, another use of empty blocks is when shrinking the heap. The collector might decide that it's a good idea to return some memory to the operating system. For example, I enjoyed this recent paper on heuristics for optimum heap size, that advocates that you size the heap in proportion to the square root of the allocation rate, and that as a consequence, when/if the application reaches a dormant state, it should promptly return memory to the OS.

Here, we have a similar heuristic for when to evacuate: when we would like to release memory to the OS but we have no empty blocks, we should compact. We use the same evacuation candidate selection approach as before, also, aiming for maximum empty block yield.

fragmentation

What if you go to allocate a medium object, say 4kB, but there is no hole that's 4kB or larger? In that case, your heap is fragmented. The smaller your heap size, the more likely this is to happen. We should compact the heap to make the maximum hole size larger.

side note: compaction via partial evacuation

The evacuation strategy of Immix is... optimistic. A mark-compact collector will compact the whole heap, but Immix will only be able to evacuate a fraction of it.

It's worth dwelling on this a bit. As described in the paper, Immix reserves around 2-3% of overall space for evacuation overhead. Let's say you decide to evacuate: you start with 2-3% of blocks being empty (the target blocks), and choose a corresponding set of candidate blocks for evacuation (the source blocks). Since Immix is a one-pass collector, it doesn't know how much data is live when it starts collecting. It may not know that the blocks that it is evacuating will fit into the target space. As specified in the original paper, if the target space fills up, Immix will mark in place instead of evacuating; an evacuation candidate block with marked-in-place objects would then be non-empty at the end of collection.

In fact if you choose a set of evacuation candidates hoping to maximize your empty block yield, based on an estimate of live data instead of limiting to only the number of target blocks, I think it's possible to actually fill the targets before the source blocks empty, leaving you with no empty blocks at the end! (This can happen due to inaccurate live data estimations, or via internal fragmentation with the block size.) The only way to avoid this is to never select more evacuation candidate blocks than you have in target blocks. If you are lucky, you won't have to use all of the target blocks, and so at the end you will end up with more free blocks than not, so a subsequent evacuation will be more effective. The defragmentation result in that case would still be pretty good, but the yield in free blocks is not great.

In a production garbage collector I would still be tempted to be optimistic and select more evacuation candidate blocks than available empty target blocks, because it will require fewer rounds to compact the whole heap, if that's what you wanted to do. It would be a relatively rare occurrence to start an evacuation cycle. If you ran out of space while evacuating, in a production GC I would just temporarily commission some overhead blocks for evacuation and release them promptly after evacuation is complete. If you have a small heap multiplier in your Immix space, occasional partial evacuation in a long-running process would probably reach a steady state with blocks being either full or empty. Fragmented blocks would represent newer objects and evacuation would periodically sediment these into longer-lived dense blocks.

mutator throughput

Finally, the shape of the heap has its inverse in the shape of the holes into which the mutator can allocate. It's most efficient for the mutator if the heap has as few holes as possible: ideally just one large hole per block, which is the limit case of an empty block.

The opposite extreme would be having every other "line" (in Immix terms) be used, so that free space is spread across the heap in a vast spray of one-line holes. Even if fragmentation is not a problem, perhaps because the application only allocates objects that pack neatly into lines, having to stutter all the time to look for holes is overhead for the mutator. Also, the result is that contemporaneous allocations are more likely to be placed farther apart in memory, leading to more cache misses when accessing data. Together, allocator overhead and access overhead lead to lower mutator throughput.

When would this situation get so bad as to trigger compaction? Here I have no idea. There is no clear maximum. If compaction were free, we would compact all the time. But it's not; there's a tradeoff between the cost of compaction and mutator throughput.

I think here I would punt. If the heap is being actively resized based on allocation rate, we'll hit the other heuristics first, and so we won't need to trigger evacuation/compaction based on mutator overhead. You could measure this, though, in terms of average or median hole size, or average or maximum number of holes per block. Since evacuation is partial, all you need to do is to identify some "bad" blocks and then perhaps evacuation becomes attractive.

gc pause

Welp, that's some thoughts on when to trigger evacuation in Immix. Next time, we'll talk about some engineering aspects of evacuation. Until then, happy consing!

21 June, 2022 12:21PM by Andy Wingo

June 20, 2022

GNU Taler news

A digital euro and the future of cash

The Central Bank of Austria has published a report in the context of a workshop celebrating 20 years of Euro-denominated cash. The report discusses the future of cash, including account- and blockchain-based designs, as well as GNU Taler.

20 June, 2022 10:00PM

Andy Wingo

blocks and pages and large objects

Good day! In a recent dispatch we talked about the fundamental garbage collection algorithms, also introducing the Immix mark-region collector. Immix mostly leaves objects in place but can move objects if it thinks it would be profitable. But when would it decide that this is a good idea? Are there cases in which it is necessary?

I promised to answer those questions in a followup article, but I didn't say which followup :) Before I get there, I want to talk about paged spaces.

enter the multispace

We mentioned that Immix divides the heap into blocks (32kB or so), and that no object can span multiple blocks. "Large" objects -- defined by Immix to be more than 8kB -- go to a separate "large object space", or "lospace" for short.

Though the implementation of a large object space is relatively simple, I found that it has some points that are quite subtle. Probably the most important of these points relates to heap size. Consider that if you just had one space, implemented using mark-compact maybe, then the procedure to allocate a 16 kB object would go:

  1. Try to bump the allocation pointer by 16kB. Is it still within range? If so we are done.

  2. Otherwise, collect garbage and try again. If after GC there isn't enough space, the allocation fails.

In step (2), collecting garbage could decide to grow or shrink the heap. However when evaluating collector algorithms, you generally want to avoid dynamically-sized heaps.

cheatery

Here is where I need to make an embarrassing admission. In my role as co-maintainer of the Guile programming language implementation, I have long noodled around with benchmarks, comparing Guile to Chez, Chicken, and other implementations. It's good fun. However, I only realized recently that I had a magic knob that I could turn to win more benchmarks: simply make the heap bigger. Make it start bigger, make it grow faster, whatever it takes. For a program that does its work in some fixed amount of total allocation, a bigger heap will require fewer collections, and therefore generally take less time. (Some amount of collection may be good for performance as it improves locality, but this is a marginal factor.)

Of course I didn't really go wild with this knob but it now makes me doubt all benchmarks I have ever seen: are we really using benchmarks to select for fast implementations, or are we in fact selecting for implementations with cheeky heap size heuristics? Consider even any of the common allocation-heavy JavaScript benchmarks, DeltaBlue or Earley or the like; to win these benchmarks, web browsers are incentivised to have large heaps. In the real world, though, a more parsimonious policy might be more appreciated by users.

Java people have known this for quite some time, and are therefore used to fixing the heap size while running benchmarks. For example, people will measure the minimum amount of memory that can allow a benchmark to run, and then configure the heap to be a constant multiplier of this minimum size. The MMTK garbage collector toolkit can't even grow the heap at all currently: it's an important feature for production garbage collectors, but as they are just now migrating out of the research phase, heap growth (and shrinking) hasn't yet been a priority.

lospace

So now consider a garbage collector that has two spaces: an Immix space for allocations of 8kB and below, and a large object space for, well, larger objects. How do you divide the available memory between the two spaces? Could the balance between immix and lospace change at run-time? If you never had large objects, would you be wasting space at all? Conversely is there a strategy that can also work for only large objects?

Perhaps the answer is obvious to you, but it wasn't to me. After much reading of the MMTK source code and pondering, here is what I understand the state of the art to be.

  1. Arrange for your main space -- Immix, mark-sweep, whatever -- to be block-structured, and able to dynamically decomission or recommission blocks, perhaps via MADV_DONTNEED. This works if the blocks are even multiples of the underlying OS page size.

  2. Keep a counter of however many bytes the lospace currently has.

  3. When you go to allocate a large object, increment the lospace byte counter, and then round up to number of blocks to decommission from the main paged space. If this is more than are currently decommissioned, find some empty blocks and decommission them.

  4. If no empty blocks were found, collect, and try again. If the second try doesn't work, then the allocation fails.

  5. Now that the paged space has shrunk, lospace can allocate. You can use the system malloc, but probably better to use mmap, so that if these objects are collected, you can just MADV_DONTNEED them and keep them around for later re-use.

  6. After GC runs, explicitly return the memory for any object in lospace that wasn't visited when the object graph was traversed. Decrement the lospace byte counter and possibly return some empty blocks to the paged space.

There are some interesting aspects about this strategy. One is, the memory that you return to the OS doesn't need to be contiguous. When allocating a 50 MB object, you don't have to find 50 MB of contiguous free space, because any set of blocks that adds up to 50 MB will do.

Another aspect is that this adaptive strategy can work for any ratio of large to non-large objects. The user doesn't have to manually set the sizes of the various spaces.

This strategy does assume that address space is larger than heap size, but only by a factor of 2 (modulo fragmentation for the large object space). Therefore our risk of running afoul of user resource limits and kernel overcommit heuristics is low.

The one underspecified part of this algorithm is... did you see it? "Find some empty blocks". If the main paged space does lazy sweeping -- only scanning a block for holes right before the block will be used for allocation -- then after a collection we don't actually know very much about the heap, and notably, we don't know what blocks are empty. (We could know it, of course, but it would take time; you could traverse the line mark arrays for all blocks while the world is stopped, but this increases pause time. The original Immix collector does this, however.) In the system I've been working on, instead I have it so that if a mutator finds an empty block, it puts it on a separate list, and then takes another block, only allocating into empty blocks once all blocks are swept. If the lospace needs blocks, it sweeps eagerly until it finds enough empty blocks, throwing away any nonempty blocks. This causes the next collection to happen sooner, but that's not a terrible thing; this only occurs when rebalancing lospace versus paged-space size, because if you have a constant allocation rate on the lospace side, you will also have a complementary rate of production of empty blocks by GC, as they are recommissioned when lospace objects are reclaimed.

What if your main paged space has ample space for allocating a large object, but there are no empty blocks, because live objects are equally peppered around all blocks? In that case, often the application would be best served by growing the heap, but maybe not. In any case in a strict-heap-size environment, we need a solution.

But for that... let's pick up another day. Until then, happy hacking!

20 June, 2022 02:59PM by Andy Wingo

June 16, 2022

health @ Savannah

GNU Health Hospital Management 4.0.4 patchset released

Dear community

GNU Health 4.0.4 patchset has been released !

Priority: High

Table of Contents

  • About GNU Health Patchsets
  • Updating your system with the GNU Health control Center
  • Summary of this patchset
  • Installation notes
  • List of other issues related to this patchset

About GNU Health Patchsets

We provide "patchsets" to stable releases. Patchsets allow applying bug fixes and updates on production systems. Always try to keep your production system up-to-date with the latest patches.

Patches and Patchsets maximize uptime for production systems, and keep your system updated, without the need to do a whole installation.

NOTE: Patchsets are applied on previously installed systems only. For new, fresh installations, download and install the whole tarball (ie, gnuhealth-4.0.4.tar.gz)

Updating your system with the GNU Health control Center

Starting GNU Health 3.x series, you can do automatic updates on the GNU Health HMIS kernel and modules using the GNU Health control center program.

Please refer to the administration manual section (https://en.wikibooks.org/wiki/GNU_Health/Control_Center )

The GNU Health control center works on standard installations (those done following the installation manual on wikibooks). Don't use it if you use an alternative method or if your distribution does not follow the GNU Health packaging guidelines.

Installation Notes

You must apply previous patchsets before installing this patchset. If your patchset level is 4.0.3, then just follow the general instructions. You can find the patchsets at GNU Health main download site at GNU.org (https://ftp.gnu.org/gnu/health/)

In most cases, GNU Health Control center (gnuhealth-control) takes care of applying the patches for you. 

Pre-requisites for upgrade to 4.0.4: None

Now follow the general instructions at

After applying the patches, make a full update of your GNU Health database as explained in the documentation.

When running "gnuhealth-control" for the first time, you will see the following message: "Please restart now the update with the new control center" Please do so. Restart the process and the update will continue.

  • Restart the GNU Health server

List of other issues and tasks related to this patchset

  • bug #62598, Payment term search stops in party
  • bug #62596: Traceback if there is no Account Receivable defined neither on party or default acct config
  • bug #62555: Too many decimals error when generating the invoice with certain discounts
  • bug #62439: Error in sequence when generating Dx Imaging order
  • bug #62428: complete blood count report takes two pages
  • bug #62427: Typo in health_services exceptions

 For detailed information about each issue, you can visit https://savannah.gnu.org/bugs/?group=health
 For detailed information about each task, you can visit https://savannah.gnu.org/task/?group=health

 For detailed information you can read about Patches and Patchsets

16 June, 2022 12:42PM by Luis Falcon

June 15, 2022

GNU Taler news

GNU Taler Scalability

Anonymity loves company. Hence, to provide the best possible anonymity to GNU Taler users, the scalability of individual installations of a Taler payment service matters. While our design scales nicely on paper, NGI Fed4Fire+ enabled us to evaluate the transaction rates that could be achieved with the actual implementation. Experiments were conducted by Marco Boss for his Bachelor's thesis at the Bern University of Applied Sciences to assess bottlenecks and suggest avenues for further improvement.

15 June, 2022 10:00PM

Andy Wingo

defragmentation

Good morning, hackers! Been a while. It used to be that I had long blocks of uninterrupted time to think and work on projects. Now I have two kids; the longest such time-blocks are on trains (too infrequent, but it happens) and in a less effective but more frequent fashion, after the kids are sleeping. As I start writing this, I'm in an airport waiting for a delayed flight -- my first since the pandemic -- so we can consider this to be the former case.

It is perhaps out of mechanical sympathy that I have been using my reclaimed time to noodle on a garbage collector. Managing space and managing time have similar concerns: how to do much with little, efficiently packing different-sized allocations into a finite resource.

I have been itching to write a GC for years, but the proximate event that pushed me over the edge was reading about the Immix collection algorithm a few months ago.

on fundamentals

Immix is a "mark-region" collection algorithm. I say "algorithm" rather than "collector" because it's more like a strategy or something that you have to put into practice by making a concrete collector, the other fundamental algorithms being copying/evacuation, mark-sweep, and mark-compact.

To build a collector, you might combine a number of spaces that use different strategies. A common choice would be to have a semi-space copying young generation, a mark-sweep old space, and maybe a treadmill large object space (a kind of copying collector, logically; more on that later). Then you have heuristics that determine what object goes where, when.

On the engineering side, there's quite a number of choices to make there too: probably you make some parts of your collector to be parallel, maybe the collector and the mutator (the user program) can run concurrently, and so on. Things get complicated, but the fundamental algorithms are relatively simple, and present interesting fundamental tradeoffs.


figure 1 from the immix paper

For example, mark-compact is most parsimonious regarding space usage -- for a given program, a garbage collector using a mark-compact algorithm will require less memory than one that uses mark-sweep. However, mark-compact algorithms all require at least two passes over the heap: one to identify live objects (mark), and at least one to relocate them (compact). This makes them less efficient in terms of overall program throughput and can also increase latency (GC pause times).

Copying or evacuating spaces can be more CPU-efficient than mark-compact spaces, as reclaiming memory avoids traversing the heap twice; a copying space copies objects as it traverses the live object graph instead of after the traversal (mark phase) is complete. However, a copying space's minimum heap size is quite high, and it only reaches competitive efficiencies at large heap sizes. For example, if your program needs 100 MB of space for its live data, a semi-space copying collector will need at least 200 MB of space in the heap (a 2x multiplier, we say), and will only run efficiently at something more like 4-5x. It's a reasonable tradeoff to make for small spaces such as nurseries, but as a mature space, it's so memory-hungry that users will be unhappy if you make it responsible for a large portion of your memory.

Finally, mark-sweep is quite efficient in terms of program throughput, because like copying it traverses the heap in just one pass, and because it leaves objects in place instead of moving them. But! Unlike the other two fundamental algorithms, mark-sweep leaves the heap in a fragmented state: instead of having all live objects packed into a contiguous block, memory is interspersed with live objects and free space. So the collector can run quickly but the allocator stops and stutters as it accesses disparate regions of memory.

allocators

Collectors are paired with allocators. For mark-compact and copying/evacuation, the allocator consists of a pointer to free space and a limit. Objects are allocated by bumping the allocation pointer, a fast operation that also preserves locality between contemporaneous allocations, improving overall program throughput. But for mark-sweep, we run into a problem: say you go to allocate a 1 kilobyte byte array, do you actually have space for that?

Generally speaking, mark-sweep allocators solve this problem via freelist allocation: the allocator has an array of lists of free objects, one for each "size class" (say 2 words, 3 words, and so on up to 16 words, then more sparsely up to the largest allocatable size maybe), and services allocations from their appropriate size class's freelist. This prevents the 1 kB free space that we need from being "used up" by a 16-byte allocation that could just have well gone elsewhere. However, freelists prevent objects allocated around the same time from being deterministically placed in nearby memory locations. This increases variance and decreases overall throughput for both the allocation operations but also for pointer-chasing in the course of the program's execution.

Also, in a mark-sweep collector, we can still reach a situation where there is enough space on the heap for an allocation, but that free space broken up into too many pieces: the heap is fragmented. For this reason, many systems that perform mark-sweep collection can choose to compact, if heuristics show it might be profitable. Because the usual strategy is mark-sweep, though, they still use freelist allocation.

on immix and mark-region

Mark-region collectors are like mark-sweep collectors, except that they do bump-pointer allocation into the holes between survivor objects.

Sounds simple, right? To my mind, though the fundamental challenge in implementing a mark-region collector is how to handle fragmentation. Let's take a look at how Immix solves this problem.


part of figure 2 from the immix paper

Firstly, Immix partitions the heap into blocks, which might be 32 kB in size or so. No object can span a block. Block size should be chosen to be a nice power-of-two multiple of the system page size, not so small that common object allocations wouldn't fit. Allocating "large" objects -- greater than 8 kB, for Immix -- go to a separate space that is managed in a different way.

Within a block, Immix divides space into lines -- maybe 128 bytes long. Objects can span lines. Any line that does not contain (a part of) an object that survived the previous collection is part of a hole. A hole is a contiguous span of free lines in a block.

On the allocation side, Immix does bump-pointer allocation into holes. If a mutator doesn't have a hole currently, it scans the current block (obtaining one if needed) for the next hole, via a side-table of per-line mark bits: one bit per line. Lines without the mark are in holes. Scanning for holes is fairly cheap, because the line size is not too small. Note, there are also per-object mark bits as well; just because you've marked a line doesn't mean that you've traced all objects on that line.

Allocating into a hole has good expected performance as well, as it's bump-pointer, and the minimum size isn't tiny. In the worst case of a hole consisting of a single line, you have 128 bytes to work with. This size is large enough for the majority of objects, given that most objects are small.

mitigating fragmentation

Immix still has some challenges regarding fragmentation. There is some loss in which a single (piece of an) object can keep a line marked, wasting any free space on that line. Also, when an object can't fit into a hole, any space left in that hole is lost, at least until the next collection. This loss could also occur for the next hole, and the next and the next and so on until Immix finds a hole that's big enough. In a mark-sweep collector with lazy sweeping, these free extents could instead be placed on freelists and used when needed, but in Immix there is no such facility (by design).

One mitigation for fragmentation risks is "overflow allocation": when allocating an object larger than a line (a medium object), and Immix can't find a hole before the end of the block, Immix allocates into a completely free block. So actually mutator threads allocate into two blocks at a time: one for small objects and medium objects if possible, and the other for medium objects when necessary.

Another mitigation is that large objects are allocated into their own space, so an Immix space will never be used for blocks larger than, say, 8kB.

The other mitigation is that Immix can choose to evacuate instead of mark. How does this work? Is it worth it?

stw

This question about the practical tradeoffs involving evacuation is the one I wanted to pose when I started this article; I have gotten to the point of implementing this part of Immix and I have some doubts. But, this article is long enough, and my plane is about to land, so let's revisit this on my return flight. Until then, see you later, allocators!

15 June, 2022 12:47PM by Andy Wingo

June 13, 2022

GNU Guix

Celebrating 10 years of Guix in Paris, 16–18 September

It’s been ten years of GNU Guix! To celebrate, and to share knowledge and enthusiasm, a birthday event will take place on September 16–18th, 2022, in Paris, France. The program is being finalized, but you can already register!

Update (2022-07-12): Preliminary program published!

10 year anniversary artwork

This is a community event with several twists to it:

  • Friday, September 16th, is dedicated to reproducible research workflows and high-performance computing (HPC)—the focuses of the Guix-HPC effort. It will consist of talks and experience reports by scientists and practitioners.
  • Saturday targets Guix and free software enthusiasts, users and developers alike. We will reflect on ten years of Guix, show what it has to offer, and present on-going developments and future directions.
  • on Sunday, users, developers, developers-to-be, and other contributors will discuss technical and community topics and join forces for hacking sessions, unconference style.

Check out the web site and consider registering as soon as possible so we can better estimate the size of the birthday cake!

If you’re interested in presenting a topic, in facilitating a session, or in organizing a hackathon, please get in touch with the organizers at guix-birthday-event@gnu.org and we’ll be happy to make room for you. We’re also looking for people to help with logistics, in particular during the event; please let us know if you can give a hand.

Whether you’re a scientist, an enthusiast, or a power user, we’d love to see you in September. Stay tuned for updates!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

13 June, 2022 03:00PM by Ludovic Courtès, Tanguy Le Carrour, Simon Tournier

June 12, 2022

GNUnet News

DHT Specification Milestones 1-3/5

DHT Technical Specification Milestones 1-3/5

We are happy to announce the completion of the following milestones for the DHT specification. The objective is to provide a detailed and comprehensive guide for implementors of the GNUnet DHT "R 5 N". The milestones consist of documenting the base data structures and processes of the protocol. This includes the specification of the DHT message wire and serialization formats.

Completed milestones overview:

  1. Defined base data structures and processes that form the foundation of the protocol: Routing table, distance metrics, infrastructure messages, bootstrapping and base functions for block processing.
  2. Defined the core data structures and processes that are specific to the R 5 N protocol: Block and peer filtering, routing table management and lookup algorithms.
  3. The protocol was extended to support path signatures. This enables optional integrity protection of paths result messages have taken in a potentially rouge environment.

The current protocol is implemented as part of GNUnet 0.17.x and gnunet-go as previously announced on the mailing list .

We invite any interested party to read the document and provide critical review and feedback. This greatly helps us to improve the protocol and help future implementations. Contact us at the gnunet-developers mailing list . As part of the remaining milestones, the specification will be updated and interoperability testing will be conducted. Further, we aim to present the draft specification at IETF.

This work is generously funded by NLnet as part of their NGI Assure fund .

12 June, 2022 10:00PM

GNUnet 0.17.1

GNUnet 0.17.1

This is a bugfix release for gnunet 0.17.0.

Download links

The GPG key used to sign is: 3D11063C10F98D14BD24D1470B0998EF86F59B6A

Note that due to mirror synchronization, not all links may be functional early after the release. For direct access try http://ftp.gnu.org/gnu/gnunet/

Noteworthy changes in 0.17.0 (since 0.17.1)

  • DHT : Bugfix in HELLO message format. LSD0004 compliance.
  • RECLAIM : OpenID Connect plugin now needs (optional) jose dependency.

A detailed list of changes can be found in the ChangeLog and the bugtracker .

12 June, 2022 10:00PM

June 09, 2022

Luca Saiu

GNU Hackers' Meeting 2022 proposal: İzmir, Turkey

The GNU Hackers Meetings (https://www.gnu.org/ghm) are a friendly and informal venue to discuss technical issues concerning GNU (https://www.gnu.org) and free software (https://www.gnu.org/philosophy/free-sw.html). The time we proposed for GHM 2022 is approaching but unfortunately we only received three replies expressing interest. If we are to hold the event then we need more participants; at this stage a simple informal expression of interest is enough. The event is planned for an extended weekend (with talks from Friday to Saturday) in October 2022 in İzmir, Turkey. For the time being all the infamous entry barriers or restrictions are lifted in Turkey, with the ... [Read more]

09 June, 2022 11:55PM by Luca Saiu (positron@gnu.org)

June 05, 2022

GNUnet News

GNUnet 0.17.0

GNUnet 0.17.0 released

We are pleased to announce the release of GNUnet 0.17.0.
GNUnet is an alternative network stack for building secure, decentralized and privacy-preserving distributed applications. Our goal is to replace the old insecure Internet protocol stack. Starting from an application for secure publication of files, it has grown to include all kinds of basic protocol components and applications towards the creation of a GNU internet.

This is a new major release. It breaks protocol compatibility with the 0.16.x versions. Please be aware that Git master is thus henceforth (and has been for a while) INCOMPATIBLE with the 0.16.x GNUnet network, and interactions between old and new peers will result in issues. 0.16.x peers will be able to communicate with Git master or 0.17.x peers, but some services - in particular the DHT - will not be compatible.
In terms of usability, users should be aware that there are still a number of known open issues in particular with respect to ease of use, but also some critical privacy issues especially for mobile users. Also, the nascent network is tiny and thus unlikely to provide good anonymity or extensive amounts of interesting information. As a result, the 0.17.0 release is still only suitable for early adopters with some reasonable pain tolerance .

Download links

The GPG key used to sign is: 3D11063C10F98D14BD24D1470B0998EF86F59B6A

Note that due to mirror synchronization, not all links might be functional early after the release. For direct access try http://ftp.gnu.org/gnu/gnunet/

Noteworthy changes in 0.17.0 (since 0.16.3)

  • GNS :
    • FCFSD: Allow configuration of relative expiration time of added records.
    • Aligned with breaking changes in specification. LSD0001
  • DHT :
    • Aligned and reordered message formats. LSD0004
    • Moved block type definitions to GANA
    • The specification has been updated to reflect the changes. LSD0004
  • UTIL :
    • Fix scheduler bug with same-priority immediately-ready tasks possibly hogging the scheduler.
    • Fix mysql/mariadb detection.

A detailed list of changes can be found in the ChangeLog and the bug tracker .

Known Issues

  • There are known major design issues in the TRANSPORT, ATS and CORE subsystems which will need to be addressed in the future to achieve acceptable usability, performance and security.
  • There are known moderate implementation limitations in CADET that negatively impact performance.
  • There are known moderate design issues in FS that also impact usability and performance.
  • There are minor implementation limitations in SET that create unnecessary attack surface for availability.
  • The RPS subsystem remains experimental.
  • Some high-level tests in the test-suite fail non-deterministically due to the low-level TRANSPORT issues.

In addition to this list, you may also want to consult our bug tracker at bugs.gnunet.org which lists about 190 more specific issues.

Thanks

This release was the work of many people. The following people contributed code and were thus easily identified: Christian Grothoff, Tristan Schwieren, Florian Dold, Thien-Thi Nguyen, t3sserakt, TheJackiMonster and Martin Schanzenbach.

05 June, 2022 10:00PM

unifont @ Savannah

Unifont 14.0.04 Released

4 June 2022 Unifont 14.0.04 is now available.  This is a minor release to fix an issue with parallel font builds.  It also contains updates to some glyphs, notably in the Runic script.

Download this release from GNU server mirrors at:

     https://ftpmirror.gnu.org/unifont/unifont-14.0.04/

or if that fails,

     https://ftp.gnu.org/gnu/unifont/unifont-14.0.04/

or, as a last resort,

     ftp://ftp.gnu.org/gnu/unifont/unifont-14.0.04/

These files are also available on the unifoundry.com website:

     https://unifoundry.com/pub/unifont/unifont-14.0.04/

Font files are in the subdirectory

     https://unifoundry.com/pub/unifont/unifont-14.0.04/font-builds/

A more detailed description of font changes is available at

      https://unifoundry.com/unifont/index.html

and of utility program changes at

      http://unifoundry.com/unifont/unifont-utilities.html

05 June, 2022 01:05AM by Paul Hardy

June 01, 2022

freedink @ Savannah

New Maintainer

Note that a new maintainer, Kjharcombe, has recently been put in place. We are hoping to be able to continue development with this package.

01 June, 2022 12:32PM by Keiran Harcombe

pspp @ Savannah

PSPP 1.6.0 has been released.

I'm very pleased to announce the release of a new version of GNU PSPP.  PSPP is a program for statistical analysis of sampled data.  It is a free replacement for the proprietary program SPSS.

Changes from 1.4.1 to 1.6.0:

  • In the Kruskal-Wallis test, a misleading result could occur if the lower bound specified by the user was in fact higher than the upper bound specified.  This has been fixed.
  • The DEFINE, MATRIX, MCONVERT, and MATRIX DATA commands are now implemented.
  • An error in the displayed signficance of oneway anova contrasts tests has been corrected.
  • Added Drag-N-Drop in output view.
  • The Explore GUI dialog supports the "Plots" subdialog. Boxplots, Q-Q Plots and Spreadlevel plots are now also available via the GUI.
  • The graphical user interface for importing spreadsheets has been improved.  The new interface provides the user with a preview of the data to be imported and interactive methods to select the desired ranges.
  • The user manual, in its Info and HTML versions, now includes graphical output examples and screenshots.
  • New command SHOW SYSTEM to easily print system information useful in bug reports.
  • Build changes:
    • Perl is no longer required to build.
    • Build now requires Python 3.4 or later.  (Building PSPP 1.4.0 also required Python, but it wasn't properly documented at the time.)
    • The Cairo and Pango libraries are now required.
    • gettext 0.20 or later is now required.
    • gtksourceview 4.x is now supported (3.x also remains supported).
  • Output improvements:
    • New drivers for output to TeX source files and to PNG files.
    • Table output styles may now be set with the new option --table-look and the new SET TLOOK command.
    • New driver option "trim" to remove empty space from PDF, PostScript, SVG, and PNG output files.
    • The PDF output driver now adds an outline to allow PDF viewers to display as a "table of contents" for the file.
    • The HTML output driver has a new option "bare".
  • New features in pspp-output:
    • New --table-look and --nth-commands options.
    • New get-table-look and convert-table-look commands.

Please send PSPP bug reports to bug-gnu-pspp@gnu.org.

01 June, 2022 05:53AM by Ben Pfaff

May 29, 2022

hello @ Savannah

hello-2.12.1 released [stable]

This release has no code changes since 2.12, but just a minor documentation fix and updated translations.

29 May, 2022 11:08PM by Reuben Thomas

May 28, 2022

www-zh-cn @ Savannah

Happy 20th Birthday GNU CTT

20 Year ago, May the 28th, GNU Chinese Translators Team was registered at Savannah.

I joined the project from Help GNU. My original intention was to support this project and maybe help myself to understand more of GNU. At that time it was only me who worked actively in translating the GNU web pages into Simplified Chinese. I had even had to approve my own translation which was not the correct approach. I really wanted some other people to join the project like the project creators, and I started to really understand that to be a volunteer in a Free Software project means to be persistent and stubborn.

Gradually we had some newcomers joining the project, like hagb, hahawang, psiace, shankangke, shi, wind, etc. I am very excited whenever there is a newcomer because I know I am the person to let them know the project and I am the person to encourage them contributing their time and talent in this project.

Today, the translation project is going on smoothly. Our progress is often at the top list of all language teams. Thanks to all the Translators. Well, there is still a long way to go because our goal is not only to translate the articles, but also to promote the idea of Free Software for its care of computer users and the community. I do wish our effort is a progressive forward step toward a world where all software is free as in freedom.

Today, GNU CTT is just 20-year young. It is still a baby who need care from all of us. Let's work together to grow it stronger.

Dear Translators, dear free software lovers, dear friends, please light your candles, please wish your wish, please make your contribution to let the world be brighter.

Happy 20th Birthday, GNU CTT.
wxie

28 May, 2022 02:56AM by Wensheng XIE

May 27, 2022

Trisquel GNU/Linux

Trisquel 10.0.1 LTS "Nabia" incremental update

Today we publish a new set of live and installation media for the 10.0 series, that applies all package upgrades and security fixes to date, and corrects bugs in the installer applications and package managers. If you are already using Trisquel 10 you can upgrade without reinstalling, simply by using your package manager or update application of choice, or by running these two commands on a terminal:

sudo apt update
sudo apt dist-upgrade

The new release iso images are available in the downloads page.

27 May, 2022 04:53PM by quidam

May 24, 2022

GNU Taler news

Who comes after us? The correct mindset for designing a Central Bank Digital Currency

The title of the paper refers to the former DIRNSA, who claimed that "nobody comes after us" just before the NSA lost control of its data on Afghanistan collaborators to the Taliban. The paper urges for this cautionary tale to be considered when central banks are creating digital currencies.

24 May, 2022 10:00PM

May 22, 2022

parallel @ Savannah

GNU Parallel 20220522 ('NATO') released

GNU Parallel 20220522 ('NATO') has been released. It is available for download at: lbry://@GnuParallel:4

Quote of the month:

  It's amazing how fast you can get with bash pipelines and GNU Parallel.
    -- Eric Pauley @EricPauley_

New in this release:

  • --latest-line shows only the latest line of running jobs.
  • --color colors output in different colors per job (this obsoletes --ctag).
  • xargs compatibility: --process-slot-var foo sets $foo to jobslot-1.
  • xargs compatibility: --open-tty opens the terminal on stdin (standard input).
  • Bug fixes and man page updates.

News about GNU Parallel:

Get the book: GNU Parallel 2018 http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html

GNU Parallel - For people who live life in the parallel lane.

If you like GNU Parallel record a video testimonial: Say who you are, what you use GNU Parallel for, how it helps you, and what you like most about it. Include a command that uses GNU Parallel if you feel like it.

About GNU Parallel

GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.

If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.

GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.

For example you can run this to convert all jpeg files into png and gif files and have a progress bar:

  parallel --bar convert {1} {1.}.{2} ::: *.jpg ::: png gif

Or you can generate big, medium, and small thumbnails of all jpeg files in sub dirs:

  find . -name '*.jpg' |
    parallel convert -geometry {2} {1} {1//}/thumb{2}_{1/} :::: - ::: 50 100 200

You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/

You can install GNU Parallel in just 10 seconds with:

    $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
       fetch -o - http://pi.dk/3 ) > install.sh
    $ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
    12345678 883c667e 01eed62f 975ad28b 6d50e22a
    $ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
    cc21b4c9 43fd03e9 3ae1ae49 e28573c0
    $ sha512sum install.sh | grep ec113b49a54e705f86d51e784ebced224fdff3f52
    79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
    fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
    $ bash install.sh

Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.

When using programs that use GNU Parallel to process data for publication please cite:

O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.

If you like GNU Parallel:

  • Give a demo at your local user group/team/colleagues
  • Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
  • Get the merchandise https://gnuparallel.threadless.com/designs/gnu-parallel
  • Request or write a review for your favourite blog or magazine
  • Request or build a package for your favourite distribution (if it is not already there)
  • Invite me for your next conference

If you use programs that use GNU Parallel for research:

  • Please cite GNU Parallel in you publications (use --citation)

If GNU Parallel saves you money:

About GNU SQL

GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.

The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.

When using GNU SQL for a publication please cite:

O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.

About GNU Niceload

GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.

22 May, 2022 01:25PM by Ole Tange

May 18, 2022

www @ Savannah

The History of GNU

Richard Stallman at the First Hackers Conference

The first Hackers Conference was held in Sausalito, California, in November 1984. The makers of the documentary Hackers: Wizards of the Electronic Age interviewed Richard Stallman at the event. They included only parts of the interviews in the film, but made some other footage available. Stallman's statements at the conference went beyond what he had written in the initial announcement of GNU.

It was at this conference that Richard Stallman first publicly and explicitly stated the idea that all software should be free, and makes it clear that "free" refers to freedom, not price, by saying that software should be freely accessible to everyone. This was probably the first time he made that distinction to the public.

Stallman continues by explaining why it is wrong to agree to accept a program on condition of not sharing it with others. So what can one say about a business based on developing nonfree software and luring others into accepting that condition? Such things are bad for society and shouldn't be done at all. (In later years he used stronger condemnation.)

Here are the things he said:

Video
Video
Video

18 May, 2022 08:50AM by Dora Scilipoti

May 16, 2022

Luca Saiu

Hackers getting married

On May 14th E. and I got married, here in Zürich. I do not normally share very personal information here; but people who knew me before January 2021 will remember me before and since that time. How she changed me for the better. E. is my joy. [Hugging photo] E. and I hugging under the cloister next to the Stadthaus. Photo by Gloria Bressan (http://www.byphotoz.com). For the occasion we invited our friends and relatives, most of whom live as émigrés in one country or another, like us. We had several of our old-time friends from the GNU Project, and some ... [Read more]

16 May, 2022 08:05PM by Luca Saiu (positron@gnu.org)

May 15, 2022

libiconv @ Savannah

libiconv 1.17 released

GNU libiconv 1.17 is released.

New in this release:

  • The libiconv library is now licensed under the LGPL version 2.1, instead of the LGPL version 2.0. The iconv program continues to be licensed under GPL version 3.
  • Added converters for many single-byte EBCDIC encodings: IBM-{037,273,277,278,280,282,284,285,297,423,424,425,500,838,870,871,875}, IBM-{880,905,924,1025,1026,1047,1097,1112,1122,1123,1130,1132,1137,1140}, IBM-{1141,1142,1143,1144,1145,1146,1147,1148,1149,1153,1154,1155,1156,1157}, IBM-{1158,1160,1164,1165,1166,4971,12712,16804}. They are available through the configure option '--enable-extra-encodings'.

15 May, 2022 03:31PM by Bruno Haible

May 10, 2022

FSF News

May 03, 2022

education @ Savannah

Along: an app to collect students' data for marketing purposes

The nonfree app Along, developed by a company controlled by Zuckerberg, leads students to reveal to their teacher personal information about themselves and their families. Conversations are recorded and the collected data sent to the company, which grants itself the right to sell it.

03 May, 2022 08:08PM by Dora Scilipoti