Planet GNU

Aggregation of development blogs from the GNU Project

July 24, 2014

FSF Blogs

Friday Free Software Directory IRC meetup: July 25

Join the FSF and friends on Friday, July 25, from 2pm to 5pm EDT (18:00 to 21:00 UTC) to help improve the Free Software Directory by adding new entries and updating existing ones. We will be on IRC in the #fsf channel on freenode.

Tens of thousands of people visit each month to discover free software. Each entry in the Directory contains a wealth of useful information, from basic category and descriptions, to providing detailed info about version control, IRC channels, documentation, and licensing info that has been carefully checked by FSF staff and trained volunteers.

While the Free Software Directory has been and continues to be a great resource to the world over the past decade, it has the potential of being a resource of even greater value. But it needs your help!

If you are eager to help and you can't wait or are simply unable to make it onto IRC on Friday, our participation guide will provide you with all the information you need to get started on helping the Directory today!

July 24, 2014 04:52 AM

July 23, 2014

FSF Events

Richard Stallman to speak in Socorro, NM

Richard Stallman's speech will be nontechnical, admission is gratis, and the public is encouraged to attend.

Time and detailed location to be determined.

Please fill out our contact form, so that we can contact you about future events in and around Socorro.

July 23, 2014 02:42 PM

Richard Stallman to speak in Bogota, Colombia

Esa charla de Richard Stallman no será técnica y será abierta al público; todos están invitados a asistir.

Favor de rellenar este formulario, para que podamos contactarle acerca de eventos futuros en la región de Bogotá.

El título, el lugar exacto, y la hora de la charla serán determinados.

July 23, 2014 12:30 PM


GnuTLS 3.3.6 and 3.2.16

Released GnuTLS 3.3.6, and GnuTLS 3.2.16, which are bug-fix releases on the next, and current stable branches respectively.

by Nikos Mavrogiannopoulos ( at July 23, 2014 12:00 AM

July 22, 2014

parallel @ Savannah

GNU Parallel 20140722 ('MH17') released

GNU Parallel 20140722 ('MH17') has been released. It is available for download at:

This release contains a major change in central parts of the code and should be considered beta quality. As always it passes the testsuite, so most functionality clearly works.

Thanks to Malcolm Cook for the brilliant idea to use a general perl expression as a replacement string.

Haiku of the month:

Are you tired of
inflexible replacements?
Use Perl expressions.
-- Ole Tange

New in this release:

  • {= perl expression =} can be used as replacement string. The expression should modify $_. E.g. {= s/\.gz$// =} to remove .gz from the string. This makes replacement strings extremely flexible.
  • Positional perl expressions (similar to {2}) are given as {=2 perl expression=} where 2 is the position.
  • One small backwards incompatability: {1}_{2} will replace {2} with the empty string if there is only one argument. Previously {2} would have been left untouched.
  • Replacement strings can be defined using --rpl. E.g. parallel --rpl '{.gz} s/\.gz$//' echo {.gz} ::: *.gz
  • The parenthesis around {= perl expression =} can be changed with --parens.
  • --tmux will direct the output to a tmux session instead of files. Each running jobs will be in its own window.
  • --halt 10% will stop spawning new jobs if 10% failed so far.
  • Bug fixes and man page updates.

GNU Parallel - For people who live life in the parallel lane.

About GNU Parallel

GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job is can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.

If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.

GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.

You can find more about GNU Parallel at:

You can install GNU Parallel in just 10 seconds with: (wget -O - || curl | bash

Watch the intro video on

Walk through the tutorial (man parallel_tutorial). Your commandline will love you for it.

When using programs that use GNU Parallel to process data for publication please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.


GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.

The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.

When using GNU SQL for a publication please cite:

O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.

About GNU Niceload

GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.

by Ole Tange at July 22, 2014 04:53 AM

July 21, 2014

Luca Saiu

Happy pi Approximation Day 2014

It’s 22/7 again. Last year on pi Approximation Day I published a simple Forth program based on an intuitive geometrical idea: see Happy pi approximation day 2013 (). I’ve been thinking about what to do in 2014 for some time, without finding anything as nice from the programming point of view. Sure, you can find series and continued fractions converging to pi, even rapidly; these methods work, but the corresponding programs are trivial to code and don’t provide any insight. So I chose another route: a practical experiment to approximate pi by cutting and weighing metal. The result turned out ... [Read more]

by Luca Saiu ( at July 21, 2014 10:10 PM

FSF Blogs

Introducing Tyler Livingston, a summer Licensing Team intern

Tyler Livingston

Hello. I am a rising Third Year law student at SMU Dedman School of Law in Dallas, TX. I am working hard to master the technical aspects of law, electronics, and software. My current interests involve protecting individuals and investigating new technology, particularly in the communications field by utilizing licenses for authorship, art, and inventions. Prior to law school, I attained a bachelor's degree in History at the University of Texas at Dallas.

Licensing is where I began to be involved with free software; the FSF in particular utilizes a great strategy of working within the current licensing jurisprudence by using copyleft to support freedom and empowerment for users over their computers and software. My computer science skills are lacking, but I have worked with UNIX systems in the past and am now finally feeling comfortable enough to make a permanent switch to enjoy software on my own terms. Other interests include electronics and travel (with a trip planned to Eastern Europe later this year).

Over the summer I will be working with the licensing department on various projects, including the Free Software Directory, publications, and hopefully a bit of the nitty-gritty licensing terms and compliance issues. The number one priority for my time at the FSF is to learn. Collaboration brings together society and carries with it several other natural positive externalities. I hope to integrate into the free software community because it is an integral cog of the free software movement. I am extremely excited to reach out to others and soak up as much as I can from this enthralling environment.

I encourage all to contact me at

More information about the FSF's internship program is available at

July 21, 2014 04:45 PM

July 18, 2014

coreutils @ Savannah

coreutils-8.23 released [stable]

by Pádraig Brady at July 18, 2014 11:35 PM

July 17, 2014

denemo @ Savannah

Release 1.1.8 is imminent

New Features:


Dynamics such as cresc. poco a poco
Text such as rall …
LilyPond takes care of the extents

Lyrics Enhancements

Lyric stanza numbers can be inserted
Available on lyrics pane menu
Lyric font style control
Melismata Insert

Text and Graphics

Graphic Title Pages
Multiple columns of text
Edit in external vector editor
Use for music books, verses …
Multi-line text, with embedded music snippets

Custom Ornaments/Symbols

Edit the Ornament Shape and Size
Re-define existing ornaments
Attach to notes or stand-alone

Custom Barlines

Define new barlines
Re-define existing barlines
Control over how/if they print in all positions

Chord Charts

Chord Symbols
With barlines, repeats
Text markings, pauses
Use for songs, jazz …
Display on smartphone while busking…

by Richard Shann at July 17, 2014 07:37 PM

gnatsweb @ Savannah

Status update on upcoming changes

Quite frankly, I have found that the current gnatsweb lacks the sufficient framing to make anything more useful out of it. Managing it as a monolithic perl CGI script just is not working out.

So, I've decided to:

1. Move the repository to Github, which gives me some better tools to manage updates
2. Move the functionality directly to Mojolicious without making a 'cleanup' release
3. Release the next version as a 'like for like' replacement

by Richard Elberger at July 17, 2014 02:55 AM

July 16, 2014

FSF Blogs

Friday Free Software Directory IRC meetup: July 18

Join the FSF and friends on Friday, July 18, from 2pm to 5pm EDT (18:00 to 21:00 UTC) to help improve the Free Software Directory by adding new entries and updating existing ones. We will be on IRC in the #fsf channel on freenode.

Tens of thousands of people visit each month to discover free software. Each entry in the Directory contains a wealth of useful information, from basic category and descriptions, to providing detailed info about version control, IRC channels, documentation, and licensing info that has been carefully checked by FSF staff and trained volunteers.

While the Free Software Directory has been and continues to be a great resource to the world over the past decade, it has the potential of being a resource of even greater value. But it needs your help!

If you are eager to help and you can't wait or are simply unable to make it onto IRC on Friday, our participation guide will provide you with all the information you need to get started on helping the Directory today!

July 16, 2014 07:22 PM

German Arias

First steps with Liberty-Eiffel


Well, after the release of Liberty-Eiffel as part of GNU I’m giving my first steps with Eiffel language. Some years ago I tried this language, but unfortunately I don’t found documentation in spanish. Now that I can read english, I will give me a try with this language. And, of course, I will also write spanish tutorials for upcoming users :)

The first thing I notice is a different terminology. An executable is called system and a set of classes is refereed as universe. The classes can be grouped in clusters into the universe. And the routines (operations) of a class, and its attributes, are called features. The routines are divided in functions or queries (which return a value) and procedures (which do not return a value). As opposed to C language, where we need a function named main, on Eiffel we can designate any procedure to start the execution.

But let start looking the example “Hello World!” that comes in the source tarball. First install Liberty-Eiffel, in my case I have the deb packages from Snapshots, because the stable packages for i386 are broken. The example is this, from — to the end of the lines are comments (I removed the original comments, but add others for explanation):


{ANY} mean that any other class can call this routine. In Eiffel we can limit the calls from certain classes. io is a class for input/output and the routine put_string write a message at default output. The source tarball, at work/eiffel.el, provide a major mode for Emacs. Save this program in a file with the name hello_world.e and compile this with:

se compile hello_world.e -o hello

In Eiffel is recommended have one class per file. The name of class capitalized and the name of the corresponding file in lowercase. Now run the executable:

Hello World.

Clap clap, our first program with Eiffel.

OK, here happened something magic. We don’t set any routine as the initial to start the execution. Liberty-Eiffel don’t implement 100% the ECMA standard in which, if the initial routine isn’t specified, is assumed it has the name make. So, I suppose in Liberty-Eiffel it is assumed to have the name main. But go ahead, Liberty use ACE files with the same purpose of makefiles in other languages. This is our ACE file for this example:


Again, from — to the end of the lines are comments. In cluster section there are two lines. The current directory, where is our file hello_world.e, and the paths of Liberty libraries, defined in Save this file in the same directory of our source file, and with the extension ace, for example hello.ace: Now you can compile the program with:

se compile hello.ace

To clean the compile products, except the executable, use:

se clean *.e

And that is all for now. In next posts I will share other examples while I learn this interesting language. Regards :)

by Germán Arias at July 16, 2014 07:32 AM

July 14, 2014

FSF Blogs

Tell the FCC: Net Neutrality is crucial to free software

UPDATE: The FCC has extended the comment deadline to Friday, July 18, 2014 at midnight. Why the extension? The FCC's servers are crashing--they can't seem to handle the number of public comments coming in. Keep 'em coming!

The agency has asked members of the public, along with industry leaders and entrepreneurs, to tell it why Internet Service Providers should be banned from traffic discrimination. This comment window is one of the best opportunities we've had to make an impact. Comments are due July 18, 2014. Submit your statement in support of Net Neutrality right away using the Electronic Frontier Foundation's free software commenting tool.

Net neutrality, the principle that all traffic on the Internet should be treated equally, should be a basic right for Internet users. It's also crucial for free software's continued growth and success. Here's why:

Media distribution giants that use Digital Restrictions Management and proprietary software to control what's on your computer have also been fighting for years to control the network. Without Net Neutrality, DRM-laden materials could be easier to access, while DRM-free competitors could be stuck in the slow lane. Web-based free software projects like GNU MediaGoblin could also suffer the slow treatment while competitors like YouTube shell out big bucks for speedier service. The bottom line--an Internet where the most powerful interests can pay for huge speed advantages could push smaller free software projects right off the map and make it harder for decentralized projects to flourish. That's not good for free software, and it's not good for other innovative voices for change in the digital world.

Tell the FCC: Net Neutrality will help free software flourish

Activists have worked for years to get to this moment. Over the last several months, things have really heated up--with Internet freedom lovers camping out outside of the FCC, serenading FCC Chairman Tom Wheeler with a special version "Which Side Are You On?" The comments flooding in to the agency have jammed the phones and crashed the FCC's email servers. And yet, Chairman Wheeler still thinks he can get away with ignoring overwhelming public outrage and wrecking the free Internet. We have to keep up our historic momentum in order to convince a cable-industry sympathizer like Chairman Wheeler to listen to the public and protect Net Neutrality.

The deadline for comments is July 18, 2014. Don't delay--comment now!

July 14, 2014 10:20 PM

gettext @ Savannah

Nick Clifton

July 2014 GNU Toolchain Update

Hi Guys,

  More toolchain goodness this month, with several new features making their way into the sources:

  * GCC can now produced compressed DWARF debug information, if it is supported by the rest of the toolchain:


    The type parameter is optional, and is used to specify the sort of compression to be used.  Accepted values are:

       * none      (don't compress debug sections)
       * zlib         (use zlib compression in ELF gABI format)
       * zlib-gnu (use zlib compression in traditional GNU format)

  * GCC has a new option to improve the optimization of ELF DSOs:


    Without this option the compiler has to assume that global symbols might be replaced by the dynamic linker and so it cannot perform any inter-procedural propagation, inlining and other optimizations on them.  While this feature is useful, for example, to rewrite memory allocation functions by a debugging implementation, it is expensive in the terms of code quality.

    With the new option enabled the compiler assumes that if interposition happens for functions the overwritting function will have precisely the same semantics (and side effects).  Similarly if interposition happens for variables, the constructor of the variable will be the same.

  * GCC has a couple of new options to disable warning messages.

    This disables warning about conversions between pointers that have incompatible types.  The option affects warnings that are not disabled by the -Wno-pointer-sign option (which only stops warnings about signed vs unsigned pointers).


    Stops warnings about implicit incompatible integer to pointer and pointer to integer conversions.

    There is also a new option to enable a new warning:


    which warns when the sizeof operator is applied to a parameter that is declared as an array in a function definition.

  * The GOLD linker now has support for the AArch64 and MIPS architectures.

  * The NEWLIB C library can now be configured with a very small footprint I/O library (--enable-newlib-nano-formatted-io).  Whilst a lot smaller in size than the default I/O support, it does have a few restrictions:

      + Long doubles are not supported.
      + Floating point support has to be explicitly requested by the program linking to either or both of: _printf_float and/or _scanf_float.

      + Wide character I/O functions are not affected by this configure option.
  * The STRINGS program supports a new command line option:

    which makes it include \n and \r characters in the strings that it displays.

  * The LD linker for COFF and PE based targets now supports a command line option:


    which makes it insert a real timestamp into the image, rather than the default value of zero.  This means that two identical binaries built at different times will compare differently, but it also means that the binaries will work with proprietary tools which cannot cope with zero timestamps.

  * The BINUTILS now have support for the AVR Tiny microcontrollers.

  * GCOV-TOOL is a new tool for manipulating profiling data.  With it you can:

      + Merge two sets of profiling data.
      + Scale or normalize the data.
      + Remove a subset of the data.

   This tool is part of the GCC sources.

July 14, 2014 08:57 AM

July 13, 2014

guix @ Savannah

Guix at OpenBio Codefest 2014

On Wednesday, July 9th, David Thompson gave a brief introduction to GNU Guix at the Open Bioinformatics Codefest 2014 hackathon. The objective of the Codefest is to give developers of bioinformatics software a chance to be fully focused on their projects for a few days and work in person. These developers are concerned with the reliability and reproducibility of their operating systems, and the limitations of their package management utilities.

See the slides on-line.

by David Thompson at July 13, 2014 08:26 PM

July 11, 2014

GNUnet News

Talk @ TUM: Peter Schaar on "Gibt es einen Schutz vor Totalüberwachung?"

On July 7th 2014 Peter Schaar (Head of the European Academy for Freedom of Information and Data Protection, former Bundesdatenschutzbeauftrager) gave a talk about technology, law and surveillance in German. You can find the video below.

by Christian Grothoff at July 11, 2014 11:21 AM

July 09, 2014

FSF Blogs

Friday Free Software Directory IRC meetup: July 11

Join the FSF and friends on Friday, July 11, from 2pm to 5pm EDT (18:00 to 21:00 UTC) to help improve the Free Software Directory by adding new entries and updating existing ones. We will be on IRC in the #fsf channel on freenode.

Tens of thousands of people visit each month to discover free software. Each entry in the Directory contains a wealth of useful information, from basic category and descriptions, to providing detailed info about version control, IRC channels, documentation, and licensing info that has been carefully checked by FSF staff and trained volunteers.

While the Free Software Directory has been and continues to be a great resource to the world over the past decade, it has the potential of being a resource of even greater value. But it needs your help!

If you are eager to help and you can't wait or are simply unable to make it onto IRC on Friday, our participation guide will provide you with all the information you need to get started on helping the Directory today!

July 09, 2014 03:56 PM

July 08, 2014

gsrc @ Savannah

Release of GSRC 2014.07.06

I'm happy to announce the 2014.01.06 release of GSRC, the GNU Source
Release Collection. GSRC is a convenient means to fetch, build and
install the latest GNU software from source via a BSD Ports-like system.
Installing a package is as simple as

$ make -C gnu/hello install

You can find more information and the documentation at the GSRC website:

This release is a snapshot of the state of released GNU software at this
time. You can download this release at or,
you can download it from the nearest mirror at

Of course, to stay up-to-date with the latest package releases
in-between releases of GSRC, you may choose instead to checkout the bzr

$ bzr checkout bzr:// gsrc

And keep up-to-date with the latest releases:

$ bzr update

If you encounter any problems with a build script, please let me know


  • Changes to the GSRC system
    • Using GARPROFILE correctly modifies the package .gar directory
  • Changes in GSRC packages
    • 4 packages have been added to GSRC since the last release
    • 61 packages have been updated since the last release
    • Coverage statistics
      • GNU packages [422/471] [89%]
      • GNOME packages
        • Core [15/116] [12%]
        • Apps [1/44] [2%]
      • GNUstep packages [22/28] [78%]
    • New packages
      • easejs (0.2.1)
      • gnudos (1.0)
      • guile-opengl (0.1.0)
      • guile-rpc (0.4)
    • Updated packages
      • anubis (4.2)
      • autogen (5.18.3)
      • bash (4.3-18)
      • ccaudio (2.1.3)
      • ccrtp (2.0.9)
      • ccscript (5.0.2)
      • cgicc (3.2.15)
      • cim (5.1)
      • cursynth (1.5)
      • dap (3.10)
      • ddrescue (1.18.1)
      • denemo (1.1.6)
      • easejs (0.2.2)
      • electric (9.05)
      • freedink (108.2)
      • freeipmi (1.4.4)
      • fribidi (0.19.6)
      • gawk (4.1.1)
      • gcc (4.9.0)
      • gcompris (14.05)
      • gdb (7.7.1)
      • gettext (0.19.1)
      • global (6.3)
      • gnubatch (1.11)
      • gnun (0.9)
      • gnunet (0.10.1)
      • gnunet-gtk (0.10.1)
      • gnupg (2.0.25)
      • gnustandards (2014)
      • gnutls (3.3.4)
      • grep (2.20)
      • gtick (0.5.3)
      • guile-dbi (2.1.5)
      • libmicrohttpd (0.9.37)
      • libtasn1 (4.0)
      • lightning (2.0.4)
      • lilypond (2.19.1)
      • linux-libre (3.15)
      • mailman (2.1.18)
      • mcron (1.0.8)
      • mediagoblin (0.6.1)
      • melting (5.1.2)
      • mifluz (0.26.0)
      • mit-scheme (9.2)
      • moe (1.6)
      • nano (2.3.4)
      • nettle (3.0)
      • parallel (20140622)
      • pspp (0.8.3)
      • pyconfigure (0.2.2)
      • r (3.1.0)
      • screen (4.2.1)
      • sipwitch (1.9.1)
      • sqltutor (0.7)
      • texmacs (
      • ucommon (6.1.5)
      • units (2.11)
      • wdiff (1.2.2)
      • xaos (3.6)
      • xnee (3.19)
      • xorriso (1.3.8)

by Brandon Invergo at July 08, 2014 07:01 PM

GNU Remotecontrol

Newsletter – July 2014


The stuff going on in the big picture now…..

United States Electricity Price per KWH
Current and Past

April May Trend % Change
$0.131 $0.136 Increase 3.82%
Year May Trend % Change % Since Difference
2004 $0.093 Same 0.00% 0.00% 0.00%
2005 $0.097 Increase 4.30% 4.30% 4.30%
2006 $0.110 Increase 13.40% 18.28% 13.98%
2007 $0.115 Increase 4.55% 23.66% 5.38%
2008 $0.120 Increase 4.35% 29.03% 5.38%
2009 $0.126 Increase 5.00% 35.48% 6.45%
2010 $0.127 Increase 0.79% 36.56% 1.08%
2011 $0.129 Increase 1.57% 38.71% 2.15%
2012 $0.129 Same 0.00% 38.71% 0.00%
2013 $0.131 Increase 1.55% 40.86% 2.15%
2014 $0.136 Increase 3.82% 46.24% 5.38%


United Kingdom Utility Prices
Current and Past

(OK, we are looking for a pretty picture to put here. Our other picture went away.)

The stuff that has caught our eye…..

Demand Response

  • An article, discussing a court ruling about a software patent. This ruling will most likely play into the Smart Grid, as the ruling is relevant to the collection of Smart Grid technologies.
  • An op-ed, echoing the success of recent OpenADR deployments.
  • An article, describing the desire to overturn a recent court ruling defining Demand Response.
  • An article, considering the benefits and costs of residential Demand Response.
  • An article, describing the impact of an environmental regulation on Demand Response.

Smart Grid – Consumer

  • An article, describing a US Senate bill in progress. The nature of this legislation provides clear indication the Federal mandate for Demand Response is soon to arrive in the United States.
  • A commentary, asserting the existing electric rate structure encourages customers to make undesired decisions.
  • A white paper, describing the three pillars of an effective connected thermostat management plan.

Smart Grid – Producer

  • A commentary, asserting the utility of the future will sell less electricity.
  • An article, describing the capability levels of a Smart Grid operator.
  • An article, reporting the new electric industry business model in Japan.
  • An article, depicting what the electric customer of today wants to hear from their provider.
  • An audio recording, discussing the most probable builders of the Smart Grid.

Smart Grid – Security

  • An article, reporting Nest Labs is looking to interconnect their thermostat to garage doors and other gadgets, opening up considerable security concerns. This is in direct contrast to their newly launched Nest Developer Program, asserting the intent to make all things safer.
  • An article, considering the extreme risks often embraced by the Internet of Things.
  • An article, discussing the new standardization strategy from ANSI.
  • An article, explaining why having a smart home is not always the best choice.

Status Update of our 2014 Plan…..

Demand Response

  • Further discussions with members of the electronics industry.
  • No other work since the April newsletter.

Unattended Server Side Automation

  • No other work since the April newsletter.

Power Line Communication

  • Further discussions with the members of the electronics industry.
  • No other work since the January newsletter.

Talk to us with your comments and suggestions on our plan for this year.

The stuff we are talking about now…..

The nature of the GNU remotecontrol software project, in light of the developing Smart Grid, requires us to understand a set of maturing technologies. The most unstable aspect of the coming Smart Grid is the economic side. We addressed the economic aspect of the Smart Grid in our June 2014 newsletter. The technology necessary for the Smart Grid to exist is essentially developed and mostly deployed, through the availability of both trusted and untrusted networks. The trusted networks are comprised of Advanced Metering Infrastructure, along with Advanced Meter Reading. The untrusted networks are comprised of the World Wide Web. The prevalent Smart Meter deployment efforts position public utilities for advancing to the next logical step, which is connecting Smart Meters to devices inside of the residential premises. The scale of Smart Meter deployment is different for each geographical region of each public utility, continuously growing larger.

We are convinced there is abundant credible evidence a national Smart Grid will be up and running in the majority of each developed country within the next two decades. The cost effectivenesses of this technology offering will incentivize people to be a part of a country specific Smart Grid. Those who do not willing become part of a country specific Smart Grid will eventually be forced into becoming part of a countries Smart Grid, through some form of a domicile mandate. The reason for this conclusion is it will become more costly to have people not connected to the Smart Grid. This conclusion is supported by a relevant trend, the elimination of analog telephone services, as the technology of today provides more effective options. A 2009 report shows almost one in four homes in the United States have only a cellular phone. The end of the analog telephone service offering is expected to occur in less than a decade, some advocating for the end to come now.

The specific action holding back the Smart Grid from advancing now is determining how the necessary user specific information will be handled. We also covered this aspect in our June 2014 newsletter. This information handling decision, made by both the end user and the public utility, will launch construction of the Smart Grid. Forcing the decision on the customer will also launch construction of the Smart Grid, but it will take longer to see construction begin. Regardless of who will make this information handling decision, this decision is coming to any culture wanting a Smart Grid.

A credible, dependable, and trustworthy software project must responsibly address the associated data privacy involved with such a large scale transformation to any culture.

Would you ever want anything less?

The success of the Smart Grid is resting upon effectively predicting demand for energy consumption and executing dynamic Demand Response. Successfully predicting demand for energy consumption involves recording historical energy consumption, for identifying trends of probable future energy demand. This historical usage is where the data privacy matter is centered. The reality is historical user data will be stored by the public utility. It is only a matter of how the residential customer wants their specific user data handled.

Many people have asked us about adding other types of thermostats to GNU remotecontrol. There are three questions that need to be answered before we can offer GNU remotecontrol support for any IP thermostat. These questions are:

  • How to CONNECT to it (NETWORK).
  • How to READ from it (CODE).
  • How to WRITE to it (CODE).

It is our hope to have dozens and dozens of thermostat types that work with GNU remotecontrol. Let us know if you designed or manufactured a device and you would like to test it with GNU remotecontrol.

The stuff you may want to consider…..

We have 0 new bugs and 0 fixed bugs since our last Blog posting. Please review these changes and apply to your GNU remotecontrol installation, as appropriate.

We have 0 new tasks and 0 completed tasks since our last Blog posting. Please review these changes and apply to your GNU remotecontrol installation, as appropriate.

The stuff you REALLY want to consider…..

The Federal Energy Regulatory Commission is charged with defining and regulating cyber security of energy facilities in the United States. There is an increasing trend to penetrate the systems associated with energy production, transmission and distribution. The US Department of Homeland Security Industrial Control Systems Cyber Emergency Response Team is part of this collective protection effort by the Federal government. ICS-CERT and Symantec have each issued reports on the threat activity. We addressed the US energy firm cyber defense protections are inadequate in the March 2014 newsletter and the financial downgrade of the entire US electric utility sector in the June 2014 newsletter.

It is radiantly clear the only way energy related activities will ever be secure is to utilize a system approach to successfully design, build, and operate the energy related activities. GNU remotecontrol is designed and built for integration with existing Information Technology systems, eliminating the need to develop a unique security policy for network connected HVAC thermostats. Any thermostat not operating under a comprehensive security plan is nothing more than a security threat that will most likely bring a security compromise to your organization.

GNU remotecontrol relies on OS file access restrictions, Apache authentication, MySQL authentication, and SSL encryption to secure your data. Talk to us you want to find out how you can further strengthen the security of your system, or you have suggestions for improving the security of our current system architecture.

Whatever you do…..don’t get beat up over your Energy Management strategy. GNU remotecontrol is here to help simplify your life, not make it more complicated. Talk to us if you are stuck or cannot figure out the best option for your GNU remotecontrol framework. The chances are the answer you need is something we have already worked through. We would be happy to help you by discussing your situation with you.


by gnuremotecontrol at July 08, 2014 05:08 PM

FSF Events

Richard Stallman - «El software libre en la ética y en la práctica» (Ciudad Ojeda, Venezuela)

Richard Stallman hablará sobre las metas y la filosofía del movimiento del Software Libre, y el estado y la historia del sistema operativo GNU, el cual junto con el núcleo Linux, es actualmente utilizado por decenas de millones de personas en todo el mundo.

Esa charla de Richard Stallman no será técnica y será abierta al público; todos están invitados a asistir.

Favor de rellenar este formulario, para que podamos contactarle acerca de eventos futuros en la región de Maracaibo.

July 08, 2014 12:15 PM

Richard Stallman - "Free Software and Your Freedom" (Trapani, Italy)

Richard Stallman will speak about the goals and philosophy of the Free Software Movement, and the status and history of the GNU operating system, which in combination with the kernel Linux is now used by tens of millions of users world-wide.

Richard Stallman's speech will be nontechnical, admission is gratis, and the public is encouraged to attend.

Please fill out our contact form, so that we can contact you about future events in and around Trapani.

July 08, 2014 11:31 AM

July 07, 2014

guix @ Savannah

GNU dmd 0.2 released

GNU dmd 0.2 has been released. It provides new features such as the ability to load new definitions for existing services, as well as bug fixes and an improved manual.

GNU dmd is a dependency-based service manager meant to be used as the init system in GNU. It is written in Guile Scheme.

by Ludovic Courtès at July 07, 2014 10:04 PM

July 04, 2014

FSF Events

Richard Stallman - "Free Software and Your Freedom" (New York, NY)

The Free Software Movement campaigns for computer users' freedom to cooperate and control their own computing. The Free Software Movement developed the GNU operating system, typically used together with the kernel Linux, specifically to make these freedoms possible.

Richard Stallman's speech will be nontechnical, admission is gratis, and the public is encouraged to attend.

Please fill out our contact form, so that we can contact you about future events in and around New York City.

July 04, 2014 12:45 PM

July 02, 2014

German Arias

Configuring GNUstep

I made a small video to show how configure GNUstep in desktops other than WindowMaker. However the official website of GNUstep changed the same day I uploaded this video. So, the instructions are slightly different. You can found the themes at GNUstep Wiki, but the link to the wiki is now at External option in top menu, not at right side as show the video. In this I use the application SystemPreferences, but all these changes can be done in a terminal using the tool defaults. For example, I use this tool to change the configuration automatically when I switch from WindowMaker to Gnome, or vice versa. With this you will see how easy is configure GNUstep in other desktops. Click the image to see the video:

Configuring GNUstep

by Germán Arias at July 02, 2014 01:14 AM

July 01, 2014

FSF Blogs

Seattle free software event this fall: Call for Participation now open

SeaGL logo The Seattle GNU/Linux Conference (SeaGL) has just announced Karen Sandler, executive director of Software Freedom Conservancy, as a keynote speaker and opened its Call for Participation. SeaGL is a free software conference in downtown Seattle, taking place on October 24 and 25, 2014. The Seattle Central Community College is hosting the event, so expect plenty of opportunities for students, as well as professionals and free software enthusiasts.

The SeaGL Call for Participation is now open! Get involved by submitting a proposal and sharing this call widely. Giving a talk at a free software event is great way to meet new people and learn new things. SeaGL is especially interested in hearing from new speakers

You can find out more about SeaGL, October 24-25 2014, by emailing or visiting them on IRC on Freenode in #seagl.

July 01, 2014 09:00 PM

GNU Spotlight with Karl Berry: 23 new GNU releases!

To get announcements of most new GNU releases, subscribe to the info-gnu mailing list: Nearly all GNU software is available from, or preferably one of its mirrors ( You can use the url to be automatically redirected to a (hopefully) nearby and up-to-date mirror.

This month, we welcome Bernd Paysan as the new maintainer (though the long-time author and developer) of gforth and vmgen, and Ruben Rodriguez (long-time developer of Trisquel) as the new maintainer of IceCat. Thanks to all.

Also, please consider attending the GNU Hackers' Meeting in Munich this year, August 15-17; attendance is free of charge, but pre-registration is essential.

A number of GNU packages, as well as the GNU operating system as a whole, are looking for maintainers and other assistance: please see if you'd like to help. The general page on how to help GNU is at To submit new packages to the GNU operating system, see

As always, please feel free to write to me,, with any GNUish questions or suggestions for future installments.

July 01, 2014 08:39 PM

denemo @ Savannah

Release 1.1.6 is out.

New features in this version:

Lyrics Interface
Cursor on music moves to match cursor in lyrics
Jump to any point in the lyrics by mouse or keyboard
The inactive pane (lyrics or music) clearly indicated by graying-out
Margins Control
Set top, bottom, left and right margins in mm.
Tweaking Typesetting
Individual notes, chords and rests can be tweaked
Accidentals and markings move with the note
Fingerings can be offset
For use when LilyPond is sub-optimal
Fret Diagrams
Chords typeset as fret diagrams
Defaults to Guitar
Can be used with Chord Names too
Comprehensive Musical Symbol Support
Menu access to the complete set of LilyPond music symbols
Over 600 types of music glyphs can be placed and dragged on the score
More Ornamentation Support
Editing Timer
Displays time spent creating a score
Accumulates over all editing sessions

by Richard Shann at July 01, 2014 01:50 PM

Andy Wingo

flow analysis in guile

Greets, and welcome back to the solipsism! I've been wandering the wilderness with my Guile hackings lately, but I'm finally ready to come back to civilization. Hopefully you will enjoy my harvest of forest fruit.

Today's article is about flow analysis and data structures. Ready? Let's rock!

flow analysis

Many things that a compiler would like to know can be phrased as a question of the form, "What do I know about the data flowing through this particular program point?" Some things you might want to know are:

  1. The set of variables that must be live.

  2. The set of variables that happen to be live. This is the same as (1) except it includes variables that aren't needed but haven't been clobbered by anything.

  3. The set of expressions whose results are still valid (i.e., haven't been clobbered by anything else).

  4. An upper and lower bound on the range of numeric variables.

Et cetera. I'll talk about specific instances of flow analysis problems in the future, but today's article is a bit more general.

The first thing to note about these questions is that they don't necessarily need or have unique answers. If GCC decides that it can't prove anything about the ranges of integers in your program, it's not the end of the world -- it just won't be able to do some optimizations that it would like to do.

At the same time, there are answers that are better and worse than others, and answers that are just invalid. Consider a function of the form:

int f():
  int a = 1
  int b = 2
  int c = a + b
  int d = b + c
  int z = x + y
  return z

In this function, there are 27 different expressions, including the return, and 27 different program points. (You can think of a program point as a labelled sub-expression. In this example none of the expressions have sub-expressions.) If we number the program points in order from 0 to 26, we will have a program that first executes expression 0 (int a = 1), then 1, and so on to the end.

Let's plot some possible solutions to the live variable flow-analysis problem for this program.

Here we see two solutions to the problem (in light and dark blue), along with a space of invalid solutions (in red). The Y axis corresponds to the variables of the program, starting with a on the bottom and finishing with z on the top.

For example, consider position 4 in the program, corresponding to int e = c + d. It is marked in the graph with a vertical bar. After position 4, the only values that are used in the rest of the program are d and e. These are the variables that are contained within the light-blue area. It wouldn't be invalid to consider a, b, and c to be live also, but it also wouldn't be as efficient to allocate space and reason about values that won't contribute to the answer. The dark blue space holds those values that may harmlessly be considered to be live, but which actually aren't live.

It would, however, be invalid to consider the variable f to be live after position 4, because it hasn't been defined yet. This area of the variable space is represented in red on the graph.

Of course, the space of all possible solutions isn't possible to represent nicely on a two-dimensional graph; we're only able to show two with colors, and that not very well as they overlap. This difficulty cuts close to the heart of the data-flow problem: that it ultimately requires computing a two-dimensional answer, which necessarily takes time and space O(n2) in program size.

Or does it?

classical flow analysis frameworks

The classical way to do flow analysis is to iterate a set of data-flow equations over an finite lattice until you reach a fixed point.

That's a pithy sentence that deserves some unpacking. If you're already comfortable with what it means, you can skip a couple sections.

Still here? Cool, me too. Let's take a simple example of sign analysis. The problem is to determine, for the integer variables of a program, at every point in the program, which ones may be negative (-), which ones may be zero (0), and which may be positive (+). All of these are conceptually bit-flags.

For example, in this program:

int f(int x):
 L0:  while (x >= 0)
 L1:    int y = x - 1
 L2:    x = y
 L3:  return x

We can assign the flags -0+ to the argument x as the initial state that flows into L0, because we don't know what it is ahead of time, and it is the only variable in scope. We start by representing the initial state of the solution as a set of sets of state values:

state := {L0: {x: -0+}, L1: Ø, L2: Ø, L3: Ø}

In this notation, Ø indicates a program point that hasn't been visited yet.

Now we iterate through all labels in the program, propagating state to their successors. Here is where the specific problem being solved "hooks in" to the generic classical flow analysis framework: before propagating to a successor, a flow equation transforms the state that flows into a program point to a state that flows out, to the particular successor. In this case we could imagine equations like this:

visit_test(expr, in, true_successor, false_successor):
  if expr matches "if var >= 0":
    # On the true branch, var is not negative.
    propagate(in + {var: in[var] - -}, true_successor)
    # On the false branch, var is not zero and not positive.
    propagate(in + {var: in[var] - 0+}, false_successor)
  else if ...

visit_expr(expr, in, successor):
  if expr matches "left = right - 1":
    if in[right] has +:
      if in[right] has 0:
        # Subtracting one from a non-negative arg may be negative.
        propagate(in + {left: in[right] + -}, successor)
        # Subtracting one from a positive arg may be 0.
        propagate(in + {left: in[right] + 0}, successor)
      # Subtracting one from a nonpositive arg will be negative.
      propagate(in + {left: -}, successor)
  else if expr matches "left = right":
    propagate(in + {left: in[right]}, successor)

The meat of classical data-flow analysis is the meet operation:

propagate(out, successor):
  if state[successor] is Ø:
    state[successor] = out
    state[successor] = meet(out, state[successor]):

# A version of meet for sign analysis
meet(out, in):
  return intersect_vars_and_union_values(out, in)

Let's run this algorithm by hand over the example program. Starting from the initial state, we propagate the L0→L1 and L0→L3 edges:

visit_test("if x <= 0", {x: -0+}, L1, L3)
→ propagate({x: 0+}, L1)
→ state[L1] = {x: 0+}
→ propagate({x: -}, L3)
→ state[L3] = {x: -}

Neat. Let's keep going. The successor of L1 is L2:

visit_expr("y = x - 1", {x: 0+}, L2)
→ propagate({x: 0+, y: -0+}, L2)
→ state[L1] = {x: 0+, y: -0+}

L2→L0 is a back-edge, returning to the top of the loop:

visit_expr("x = y", {x: 0+, y: -0+}, L0)
→ propagate({x: -0+, y: -0+}, L0)
→ state[L0] = meet({x: -0+, y: -0+}, state[L0])
→ state[L0] = meet({x: -0+, y: -0+}, {x: -0+})
→ state[L0] = {x: 0+}

Finally, L3 has no successors, so we're done with this iteration. The final state is:

{L0: {x: -0+},
 L1: {x: 0+},
 L2: {x: 0+, y: -0+},
 L3: {x: -}}

which indeed corresponds with what we would know intuitively.

fixed points and lattices

Each of the steps in our example flow analysis was deterministic: the result was calculated from the inputs and nothing else. However the backwards branch in the loop, L2→L0, could have changed inputs that were used by the previous L0→L1 and L0→L3 forward edges. So what we really should do is iterate the calculation to a fixed point: start it over again, and run it until the state doesn't change any more.

It's easy to see in this case that running it again won't end up modifying the state. But do we know that in all cases? How do we know that iteration would terminate at all? It turns out that a few simple conditions are sufficient.

The first thing to ensure is that state space being explored is finite. Here we can see this is the case, because there are only so many ways you can combine -, 0, and +. Each one may be present or not, and so we have 2n = 23 = 8 possible states. The elements of the state array will be a set with at most one entry for each variable, so the whole state space is finite. That at least ensures that an answer exists.

Next, the "meet" operation has to be commutative, associative, and idempotent. The above example used intersect_vars_and_union_values. We intersect vars because it only makes sense to talk about a variable at a program point if the variable dominates the program point. It didn't make sense to propagate y on the L2→L0 branch, for example. It's usually a good idea to model a data-flow problem using sets, as set union and intersection operations fulfill these commutative, associative, and distributive requirements.

Finally, the state being modelled should have a partial order, and functions that add information along control-flow edges -- above, visit_test and visit_expr -- should preserve this partial ordering. That is to say, visit_test and visit_expr should be monotonic. This means that no matter on what control paths data propagates, we keep building towards an answer with more information, making forward progress. This condition is also easily fulfilled with sets, or more generally with any lattice. (A lattice is nothing more than a data type that fulfills these conditions.)

Iterating the data-flow equations until the state stops changing will find a fixed point of the lattice. Whether you find the greatest or least fixed point is another question; I can't help linking to Paul Khuong's old article on Québécois student union strikes for a lovely discussion.

Another question is, how many iterations are necessary to reach a fixed point? I would first note that although in our walk-through we iterated in forward order (L0, L1, L2, L3), we could have visited nodes in any order and the answer would be the same. I'll cut to the chase and say that if:

  1. you represent your state with bitvectors

  2. the control-flow graph is reducible (has only natural loops)

  3. the meet operation on values is bitvector union or intersection

  4. you visit the program points in topologically sorted order

If these conditions are fulfilled, then you will reach a fixed point after LC + 2 iterations, where LC is the "loop-connectness number" of your graph. You can ensure (1), (3), and (4) by construction. (Reverse post-order numbering is an easy way to fulfill (4).) (2) can be ensured by using programming languages without goto (a for loop is always a natural loop) but can be violated by optimizing compilers (for example, via contification).

Loop connectedness is roughly equivalent to the maximum nesting level of loops in the program, which has experimentally been determined to rarely exceed 3. Therefore in practice, data-flow analysis requires a number of steps that is O(n * 5) = O(n) in program size.

For more information on data-flow analysis, including full proofs and references, see Carl Offner's excellent, excellent manuscript "Notes on Graph Algorithms used in Optimizing Compilers". I don't know of any better free resource than that. Thanks, Carl!

an aside: the kCFA algorithms

I just finished describing what I called "classical" data-flow analysis. By that I mean to say that people have been doing it since the 1970s, which is classical enough as far as our industry goes. However with the rise of functional languages in the 1980s, it became unclear how to apply classical data-flow analysis on a language like Scheme. Let's hear it from the horse's mouth:

This brings us to the summer of 1984. The mission was to build the world's most highly-optimising Scheme compiler. We wanted to compete with C and Fortran. The new system was T3, and the compiler was to be called Orbit. We all arrived at WRL and split up responsibility for the compiler. Norman was going to do the assembler. Philbin was going to handle the runtime (as I recall). Jonathan was project leader and (I think) wrote the linker. Kranz was to do the back end. Kelsey, the front end. I had passed the previous semester at CMU becoming an expert on data-flow analysis, a topic on which I completely grooved. All hot compilers do DFA. It is necessary for all the really cool optimisations, like loop-invariant hoisting, global register allocation, global common subexpression elimination, copy propagation, induction-variable elimination. I knew that no Scheme or Lisp compiler had ever provided these hot optimisations. I was burning to make it happen. I had been writing 3D graphics code in T, and really wanted my floating-point matrix multiplies to get the full suite of DFA optimisation. Build a DFA module for T, and we would certainly distinguish ourselves from the pack. So when we divided up the compiler, I told everyone else to back off and loudly claimed DFA for my own. Fine, everyone said. You do the DFA module. Lamping signed up to do it with me.

Lamping and I spent the rest of the summer failing. Taking trips to the Stanford library to look up papers. Hashing things out on white boards. Staring into space. Writing little bits of experimental code. Failing. Finding out *why* no one had ever provided DFA optimisation for Scheme. In short, the fundamental item the classical data-flow analysis algorithms need to operate is not available in a Scheme program. It was really depressing. I was making more money than I'd ever made in my life ($600/week). I was working with *great* guys on a cool project. I had never been to California before, so I was discovering San Francisco, my favorite city in the US and second-favorite city in the world. Silicon Valley in 1984 was beautiful, not like the crowded strip-mall/highway hell hole it is today. Every day was perfect and beautiful when I biked into work. I got involved with a gorgeous redhead. And every day, I went in to WRL, failed for 8 hours, then went home.

It was not a good summer.

At the end of the summer, I slunk back to CMU with my tail between my legs, having contributed not one line of code to Orbit.

Olin Shivers, A history of T

It took him another 7 years, but Shivers stuck with it, and in the end came out with the family of algorithms known as k-CFA. Instead of focusing on loops, which Scheme doesn't have syntactically, Shivers used continuation-passing style to ruthlessly simplify Scheme into a dialect consisting of not much more than function calls, and focused his attention on function calls. The resulting family of flow algorithms can solve flow equations even in the presence of higher-order functions -- a contribution to computer science born out of necessity, failure, and stubbornness.

With all those words, you'd think that I'd be itching to use k-CFA in Guile, and I'm not. Unfortunately even the simplest, least expressive version (0-CFA) is O(n2); 1-CFA is exponential. I don't have time for that. Instead, Guile is able to use classical DFA because it syntactically distinguishes labelled continuations and functions, and contifies functions to continuations where possible, which makes the Scheme DFA problem exactly the same as in any other language.

n times what?

Now that we have established that the number of visit operations is O(n), it remains to be seen what the individual complexity of a visit operation is in order to determine the total complexity. The naïve thing is just to use bitvectors, with each of the bitvectors having as many entries as the program has variables, times however many bits we are using.

This leads to O(|L|*|V|) space and time complexity, where |L| is the number of program points (labels) and |V| is the number of variables. As the number of variables is generally proportional to the size of program, we can approximate this as O(n2).

In practice, this means that we can use data-flow analysis to programs up to about 10000 labels in size. Sign analysis on a 10000-label function would require 100002*3/8 = 37.5 MB of memory, which is already a bit hefty. It gets worse if you need to represent more information. I was recently doing some flow-sensitive type and range inference, storing 12 bytes per variable per program point; for a 10000-label function, that's more than a gigabyte of memory. Badness.

shared tails

Although it was the type inference case that motivated this investigation, sign inference is similar and more simple so let's go with that. The visit_expr and visit_test functions above are only ever going to add additional information about the variables that are used in or defined by an expression; in practice this is a small finite number. What if we chose a representation of state that could exploit this fact by only adding O(1) amounts of data, sharing a common tail with preceding expressions?

If we draw a control-flow graph for the sign analysis program, we get something like:

The goal is to create a data structure that looks like the dominator tree. For "normal" control-flow edges -- those whose destination only have one predecessor -- we can avoid the "meet" operations, and just copy the predecessor's out set to the successor's in set. We then define "meet" as an adjoin operation that effectively conses the new information onto a shared tail, if it wasn't there already. The first iteration through the CFG will initialize the shared tail of a given control-flow join to the set of variables flowing into the join's dominator. Subsequent information will adjoin (cons) on new incoming values. In this case the resulting data structure ends up looking like:

Here the italic references like L1 indicate shared structure, and the tuples annotating the edges represent additional information flow, beyond that information that was already present in the successor's dominator.

Of course, you can implement this with linked lists and it will work fine. The problem there will be lookup speed -- when your visit operation (visit_expr or visit_test) goes to look up the sign of a variable, or the same happens via the meet operation, you get O(n) lookup penalties. Anticipating this, I implemented this with a version of Phil Bagwell's vhashes, which promise O(log n) variable lookup. See Guile's documentation, or Bagwell's excellent paper.

Note that you can't remove items from sets once they have been added in a shared-tail flow analysis; to keep the meet function monotonic, you have to instead insert tombstone entries. Not so nice, but it is what it is.

A shared-tail flow analysis consumes only O(1) additional memory per node, leading to O(n) space complexity. I have some measured space and time graphs below that show this experimentally as well.

space and time

Unfortunately, lookup time on branchy vhashes is really terrible: O(log n) in the best case, and O(n) at worst. This is especially acute because there is no easy way to do unions or intersections on vhashes -- you end up having to compute the unshared heads of the two vhashes you are merging, and looking up elements in one in the other... I could go on, but I think you believe me when I say it gets complicated and slow. It's possible to beat a bitvector approach in time for relatively "big" problems like type analysis, but for common subexpression elimination where I was just storing a bit per expression, it was tough to beat the speed of bitvectors.

I started looking for another solution, and in the end came on a compromise that I am much happier with, and again it's Phil Bagwell to the rescue. Instead of relying on vhashes that explicitly share state, I use Clojure-style persistent sparse bit-sets and bit-maps that share state opportunistically.

Guile's intset module implements a bitvector as a functional tree whose branches are vectors and whose leaves are fixnums. Each leaf represents one range of 32 integers, and each branch on top of it increases the range by a factor of 8. Branches can be sparse, so not all integers in the range of an intset need leaves.

As you would expect, adjoining an element onto such a tree is O(log n). Intersecting is much faster than vhashes though, as intsets partition the key space into power-of-two blocks. Intsets try hard to share state, so that if your adjoin would return the same value, the result is the same object, at the same address. This allows sub-trees to be compared for equality via pointer comparison, which is a great fast-path for intersection and union.

Likewise, Guile's new intmap module allow the association of larger values with integer keys.

science! fetch your white coats and lab books!

I had the chance to actually test the system with all three of these data structures, so I compiled one of Guile's bigger files and recorded the memory used and time taken when solving a 1-bit flow analysis problem. This file has around 600 functions, many of them small nested functions, many of them macro-generated, some of them quite loopy, and one big loopless one (6000 labels) to do the initialization.

First, a plot of how many bytes are consumed per label during while solving this 1-bit DFA.

Note that the X axis is on a log scale.

The first thing that pops out at me from these graphs is that the per-label overhead vhash sizes are indeed constant. This is a somewhat surprising result for me; I thought that iterated convergence would make this overhead depend on the size of the program being compiled.

Secondly we see that the bitvector approach, while quadratic in overall program size, is still smaller until we get to about 1000 labels. It's hard to beat the constant factor for packed bitvectors! Note that I restricted the Y range, and the sizes for the bitvector approach are off the charts for N > 1024.

The intset size is, as we expected, asymptotically worse than vhashes, but overall not bad. It stays on the chart at least. Surprisingly, intsets are often better than vhashes for small functions, where we can avoid allocating branches at all -- note the "shelves" in the intset memory usage, at 32 and 256 entries, respectively, corresponding to the sizes that require additional levels in the tree. Things keep on rising with n, but sublinearly (again, recall that the X axis is on a log scale).

Next, a plot of how many nanoseconds it takes per label to solve the DFA equation above.

Here we see, as expected, intsets roundly beating vhashes for all n greater than about 200 or so, and show sublinear dependence on program size.

The good results for vhashes for the largest program are because the largest program in this file doesn't have any loops, and hardly any branching either. It's the best case for vhashes: all appends and no branches. Unfortunately, it's not the normal case.

It's not quite fair to compare intsets to bitvectors, as Guile's bitvectors are implemented in C and intsets are implemented in Scheme, which runs on a bytecode VM right now. But still, the results aren't bad, with intsets even managing to beat bitvectors for the biggest function. The gains there probably pay for the earlier losses.

This is a good result, considering that the goal was to reduce the space complexity of the algorithm. The 1-bit case is also the hardest case; when the state size grows, as in type inference, the gains of using structure-sharing trees grow accordingly.


Let's wrap up this word-slog. A few things to note.

Before all this DFA work in Guile, I had very little appreciation of the dangers of N-squared complexity. I mean, sometimes I had to to think about it, but not often, expecially if your constant factors are low, or so I thought. But I got burned by it; hopefully the next time, if any, will be a long time coming.

I was happily, pleasantly surprised at the expressiveness and power of Bagwell/Clojure-style persistent data structures when applied to the kinds of problems that I work on. Space-sharing can make a fundamental difference to the characteristics of an algorithm, and Bagwell's data structures can do that well. Intsets simplified my implementations because I didn't have to reason much about space-sharing on my own -- finding the right shared tail for vhashes is, as I said, an unmitigated mess.

Finally I would close by saying that I was happy to fail in such interesting (to me) ways. It has been a pleasant investigation and I hope I have been able to convey some of the feeling of it. If you want to see what the resulting algorithm looks like in practice, see compute-truthy-expressions.

Until next time, happy hacking!

by Andy Wingo at July 01, 2014 08:00 AM

June 30, 2014

Riccardo Mottola

DataBasin: more powerful CSV writing for Select and Select-Identify

DataBasin, the clean-room impelmentation of a data extraction tool for available for GNUstep and Mac just made a big leap forward!

If you have ever done a SELECT in, you might have noticed that the results are not ordered and that semi-joined fields ("." notation) are handled strange because a whole object is returned.

DataBasin extracts the  columns of the CSV file by checking the first row of the dataset and recursing on the field names of salesforce's response. The effects in DataBasin are the following:
  • The field order is not preserved
  • If a sub-object is queried, all those fields are grouped together in that object
  • If more than one field is queried on a sub-object and on certain record this object is missing, the number of columns between records is inconsistent: doesn't return the whole object and DataBasin iterates record after record in the sub-object, but if it it is totally missing, only one empty column will be written, not as many as needed
  • Capitalization of the column names doesn't get preserved (compared to the above issues, something really minor)

I added a new option available in write out fields in the order of the query.

This feature implies a more complex under the hood. For each record gotten in the response, the field names are reconstructed as before (e.g. by recursing inside sub-objects) and put in a Dictionary.

The original SOQL query is parsed and the field names extracted, then these names are used as keys for each record dictionary and written out.
  • Order preserved, also with sub-objects
  • Fields preserved and empty values are written even if whole sub-objects are missing
  • Case is preserved
This approach looks fine and comes with a relatively small performance penalty, however it is delicate because during parsing of the query, many naming subtleties of needs to be inferred, that is, DataBasin tries to infer how the name in the Response will be:
  • field aliases
  • aggregate names with no alias, which are called progressively Expr0, Expr1
  • idiosyncrasies with aggregate functions. A field in a sub-objects is called Object__c.Field__c when in a regular query, but when used in an aggregate query, only Field__c is used, disregarding the join-depth
  • COUNT() vs.COUNT(Id)
I hope I have not missed any! in any case, enjoy!

This feature is available now both in Select and Select-Identify, since it is a problem of the CSV Write backend and enhances DataBasin operation usability greatly, bringing it on par with DataLoader.
Since DataLoader performces similar transformation and sometimes produces wrong results, I left this feature as optional with the flag "Write fields in query order".

by Riccardo ( at June 30, 2014 11:24 PM