Tuesday, January 29, 2008

Sage 2.10.1 release plan and an Outlook for 2.10.2

After various release candidates, i.e. Sage 2.10.1.rc0-rc2 we are getting finally close to the final release. The release cycle has been longer than usual since we still have a number of blockers to resolve, i.e. the ATLAS tuning issue as well as fallout from the GNUTLS update which causes segfaults when starting the notebook. It will be the release for Sage Days 7, but we might still push another bug fix only release if it turns out that we have some glaring, must fix bugs in Sage 2.10.1. Sage 2.10.2 (or .3) will probably merged David Roe's unramified and eisenstein extensions of Qp and Zp as well as John Voight's code for enumerating totally real fields with improvements by Craig Citro. We also expect more fixes for the 64 bit OSX port as well as for the Solaris port, but it is uncertain if either one of them will become officially stable in that release.

Cheers,

Michael

Friday, January 25, 2008

Worst SPKG ever!

Well, in case you don't get the Simpsons reference about Comic Book Guy's catch phrase you might want to read those two links. But to get back to the point of this post: I spend about 6 hours or so on ticket #1852. It looked very harmless, i.e. just enable ATLAS during the configure phase and fix some other odd issues, but it turned out to be the worst spkg I ever had to fix. No need to name names here since the subject of this post is mostly tongue in cheek by now. While I did work on the spkg I did feel differently, but those things had a tendency to fade into the background once I have moved on to other things.

Do I blame the person who initially did create the spkg in question? Not really, since we are all busy people and I also had to read the R build manual to fix some of the issues I saw. It was just surprising that this particular spkg was quite bad compared to all the other ones, i.e. we installed about 60 mb unneeded object files somewhere in SAGE_LOCAL and those do add up for the final download size. I am just glad it is done and now I could start bitching about gsl 1.10, see #1623. Since that one is done, too, now it is on to #1631 and then who knows what. So, sooner or later there will be another "Worst SPKG ever!" and this one will be a distant memory ;)

Cheers,

Michael

Tuesday, January 22, 2008

Sage 2.10.1 Progress

Progress on Sage 2.10.1 has been good so far. We closed 71 tickets as off alpha2 and have about another 30 patches or so under review. The initial load of patches were merged during Bug Day 9, which turned out to be an excellent debug session. The wiki page is somewhat misleading since the IRC log isn't up yet and many people who participated did not add themselves to the wiki page.

But there is some trouble in paradise since the upgrade of GNUTLS and its related components broke the notebook in secure mode. This lead to a revisit in #sage-devel about whether linking a GPLed program like Sage against OpenSSL is legal or not. While opinion diverged in this it remains a fact that we should fix the GNUTLS issue (or is it in Twisted? I am not sure yet) since we can't ship OpenSSL libraries with the binary and the reliance on a properly working system OpenSLL in the past had lead to plenty of trouble in the past.

Besides the above issue what else can we expect of 2.10.1? Better ATLAS build times on Pentium Ms and 32 bit Athlons (not merged yet as of alpha1). a lot of work toward building a 64 bit OSX version of Sage and maybe even a universal binary for OSX (32 bit ppc and x86), so stay tuned.

Cheers,

Michael

Friday, January 18, 2008

Sage 2.10 released!

Sage 2.10 is out. It wasn't planned this way since we had originally shot for some release early next week, ideally Monday. So what happened? 2.10.alpha4 build perfectly on all of our test platforms and the results from the patches created during Doc Day 1 would potentially destabilize the build. All was set in motion to release at this point - after all we had gone nearly twelve days since the last release, which is longer than the usual time between releases - but the release was held up by some tricky, long standing but extremely hard to reproduce bug in the Maxima interface of Sage triggered by some last minute merge of an essential fix. William finally tracked it down and fixed it, which delayed the release about 24 hours. I used that time to catch up on sleep and to prepare myself for Bug Day 9.

The main changes for Sage 2.10 are:
  • Python is now built with ucs4
  • FLINT was updated to the 1.0.5 release
  • Many bug fixes and also a couple of significant memory leak fixes
  • Integrate a fix to the MPFR library so we no longer smash the stack with high precisions. This cause a performance regression when multiplying
  • Fix a long standing, hard to hit bug in the Maxima interface
  • When launching external executables LD_LIBRARY_PATH is restored for external binaries, i.e. those which do not reside in SAGE_LOCAL/bin. Before the patch many external executables failed due to library conflicts. This was reported many times on Debian based systems
What does the future hold? Sage 2.10.1 is already getting merges on the way to alpha0. So stay tuned :)

Cheers,

Michael

Wednesday, January 16, 2008

Sage 2.10.alpha4, Doc Day 1 & Bug Day 9

We are making progress toward Sage 2.10. With 2.10.alpha4 we have closed 72 tickets, but we are working very hard to make this release an excellent one. There are two events coming up this week:

  • Doc Day 1 will be held on Thursday the 17th, 2008, i.e. tomorrow, starting at 9am PST. We will write doctests and documentation. Since it is the first of its kind we will play it by ear and see how it goes.
  • Bug Day 9 will be held on Saturday the 19th, 2008, starting at 9am PST. It will be business as usual and the plan is to put the finishing touches in the 2.10 release. From experience we will sort out the fallout from all the merges we do at the bug day on Sunday and release either Sunday or Monday night depending on how many issues and last minute problems creep up.
It does look like we will miss most of the goals we set ourselves for 2.10, but we made progress on various fronts and will just shift any issue left open to bug fix releases 2.10.x which are planned weekly until at least 2.10.3

Cheers,

Michael

Sunday, January 13, 2008

Sage 2.10 vs. MacIntel 64 bit

Well, after talking about for a couple weeks [or is it months by now?] I finally bit the bullet and stated porting the current Sage 2.10.alpha2 to 64 bit MacIntel. I started somewhere around 7 pm locally and now, roughly 12 hours later I struck out. For a first attempt it wasn't too bad. My build notes are a little rough, but surprisingly most issues are fiddling with CFLAGS, CXXFLAGS or CPPFLAGS. That wasn't too much of a surprise since most of the code already runs on MacOSX, but in the end I failed with various spkgs:
  • numpy: odd issues with distutils, caused by the fact that I had to be tricky with python and add some flags after the fact for distutils. Consequently scipy and some other spkgs couldn't be build. Josh Kantor is investigating, so hopefully once I wake up it might be solved.
  • PolyBoRi: SCons issues, mostly ENV related. Here I ran out of time
  • twisted: It seems to depend on MacOSX specific python extensions, but I didn't investigate, but should be fixable.
But after working around various issues I got all but 5 extension modules to work (four times numpy related and the PolyBoRi extension), but in the end Sage did not start up to do the ceremonial 1+1, but failed in libSingular with:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000018
0x00000001045f0bfb in omInsertBinPage [inlined] () at om_Alloc.c:109
109 after->next->prev = page;
(gdb) bt
Function omInsertBinPage was inlined into function omAllocBinFromFullPage at line 145.
#1 0x00000001045f0bfb in omAllocBinFromFullPage (bin=0x10495aef0) at om_Alloc.c:145
#2 0x00000001045eebde in __omDebugAlloc (size_bin=0x10495aef0, flags=, track=, f=, l=) at om_Alloc.c:145

Since I need to catch some sleep now for a meeting in roughly seven hours I will sign off now.

The plan for the next 24 hours: Merge the easy changes back into 2.10.alpha3 or alpha4, update a bunch of spkgs and get Sage 2.10 toward release candidate status. There is still much to do for 2.10 and we had only planned to officially support MacIntel 64 bit with 2.10.1. I am not sure if we will hit that target, but it isn't looking too bad. Either way, William is excited and I guess there are quite a large number of people out there who would really like to use it. So feel free to help out.

Cheers,

Michael

Friday, January 11, 2008

The ongoing struggle for Sage 2.10

So far I have released two alphas for Sage 2.10, which can be found in the usual place, i.e. here. Note that there is already an alpha2 directory which contains all so far merged patches and updated spkgs for the next milestone. That way you can easily catch up to my current state. It also enables you to upgrade by downloading the spkgs and dropping them in and downloading the patches and bundles and applying them. Sometimes the order matters, so keeping an eye on the trac timeline helps.

I am not very happy with the progress on 2.10 so far because I ended up auditing the whole test suite with valgrind again and ended up finding three issues that ought to be fixed sooner rather than later. One of them (#1739) has already been fixed, the other two I hadn't even been entered into trac yet. I also build one of the alphas on an Itanium with the tool chain from hell in order to track down a PolyBoRi issue, but I struck out on that one after about a day and a half.

After all this frustration and hair pulling I am finally back to merging and fixing issues I can get done in a reasonable amount of time. But while I do that I will rebuild all of Sage 2.10.alpha2 (once it is out in a little while) with a ucs4 enabled python. The complete rebuild is needed since all python components need to be recompiled and that is only possible if we artificially bump all version numbers of the python related spkgs. We need to do that for the upgrade path anyway, but to experiment a complete rebuild is the easier way. If a ucs4 enabled python works and we have upgraded all the components we plan for Sage 2.10 I will end up scanning all the packages and artificially bump all non-upgraded python dependent spkgs.

You should have gotten the impression by now that 2.10 is more than 3 days away from being released. My current guesstimate is about a week from now, which taken into consideration that many people were at the AMS meeting and the holiday fallout for people to catch up on isn't too bad for a new major release with that many changes compared to 2.9.

Cheers,

Michael

Tuesday, January 8, 2008

Sage 2.9.3 released, Sage 2.10 release cycle opened

Well, Sage 2.9.3 was actually release two days ago, but since there was never an official release announcement I figured it might be worth mentioning it here. We only merged three tickets, but one of those was a long standing NTL leak which I complained about previously. The issue was fixed by Willem Jan Palenstijn. It turned out to be missing __dealloc__ methods in two of the NTL wrappers classes. I never thought that the memory leak would be caused by missing deallocation in general. I foolishly assumed that since the people who wrote the code used __dealloc__ in all the other classes they would do so in those classes, too. Interestingly enough this leak wasn't caught by doctesting the NTL wrapper, but by padics.py. But I am glad Willem caught it. So why all this focus on memory leaks, even the small ones? There are various reasons:
  • Code that leaks makes you look unprofessional. While there may be occasional small leaks (the python interpreter does leak a couple bytes over the run of most Sage session) something as massive as the NTL wrapper leak forces you to quit and restart Sage. And that is totally unacceptable.
  • Even small memory leaks add up to large leaks if you hit them often enough. In the past we had some leaks in linear algebra which caused about a total leak of one gigabyte an hour with certain code. We fixed those in the linear algebra code and afterwards the code still leaked about one gigabyte a day. We investigated and as it turned out that one gigabyte a day leak was caused by a single 8 byte leak in a code path he hit very often in that computation. The discovery was live during my Sage Days 5 presentation when I did analyze some of Ifti's example code at the end of the talk. A video of that ought to exist somewhere on Google video.
  • Many programs run for a long time and usually need all the available physical memory. Once you hit swap because your working set is too large performance goes down the drain. In border line cases not having memory leaks may make a difference. It is also quite bad if you run a lot of independent computations back to back and have to restart Sage periodically because of memory leaks. We also had a bunch of reports about this issue in the past.
  • And now the most important reason for some of us: Well, you may say, I have 16, 32 or 64 GB of Ram and why would I care about the little leaks? The answer is clear: performance. If python's garbage collector has to keep track of a billion leaked eight byte segments that cannot be good. In some case the performance difference for long running code was in excess of ten percent by fixing some small eight byte memory leak.
  • EDIT: I forgot to mention one important experience. There was a memory leak when computing a cusp and to our amazement the fix (freeing one mpz in a premature return path) broke a doctest. Upon investigation it turned out that the result prior to the memory leak fix was incorrect. I must admit that this was the first and only time I ever had a memory leak affect the correctness of a program. To this day I still cannot understand why it happened.

I am also release manager for the 2.10 release of Sage and 2.10.alpha0 was uploaded earlier today. The goals for the 2.10 release are ambitious:
  • Update a lot of spkgs to the current release
  • Solaris 10 support in 32 bit mode on Opteron/x86
  • FreeBSD support out of the box
The currently planned release date is about four days from now, but the way it looks right now that might get pushed back a little. Otherwise we might postpone some of the features to 2.10.1 - life goes on.

Cheers,

Michael

Saturday, January 5, 2008

Sage 2.9.2 Released

Sage 2.9.2 has been released. We merged relatively few tickets this round because this release is targeted for the AMS meeting in San Diego that starts this weekend. The big change from 2.9.1.1 is the much improved 3D support via jmol. Robert Bradshaw and William Stein spend a lot of work on this right up to the release. Let's hope everything is working out with that code. Additionally we fixed some long standing build and relocation issues. Please report any problems as usual.

Sage 2.10 will happen in about a week and we plan to update a lot of spkgs as well as add full support for FreeBSD and limited support for Solaris. We will see how that works out, the release date certainly isn't written in stone and depending on how the merge goes we might push it back. If you look at the tracker we have about 272 ticket open against 2.10. I am hoping that the coding spring this weekend will help reduce this number substantially. There are also a bunch of patches ready to go in that we did not merge since we were very risk averse and also had a hard deadline which we cannot move.

Cheers,

Michael

Thursday, January 3, 2008

My first year with Sage and an outlook for 2008

To my own surprise I have been involved with Sage for a little over a year now. I thought initially that my first message to sage-devel was in March or April of 2007, but by checking today I saw that it was in December 2006. The first couple months were mostly about fixes to the Cygwin build and general discussion. Only later I started to work on the build system and spkgs portability and eventually I started to manage releases toward the end of the year. I am also maintaining various ports like the Linux/PPC port and am actively working on the Solaris, FreeBSD ports and Cygwin re-port.

So what will the future bring for Sage in 2008? While predictions are notoriously hard and often hilariously wrong I will venture some guesses. I will limit myself to the areas I plan to work on:
  • Full Solaris support within three months: We are very close on this and in the next week I will hopefully have time to push all the changes I have accumulated into the official spkgs. There are still some known bugs that seem to be triggered only on Solaris (some libSingular issues), but I am confident we will fix those and by we in this case I mean hopefully Martin Albrecht.
  • Full FreeBSD support in one month: This one seems like a no brainer. The port went fairly smooth and aside from the odd issue with Singular and C++ headers it all seems easily mergable. Note that even now you can run the Linux binary on FreeBSD via the Linux emulation.
  • Full resurrection of the Cygwin port: I will not give any estimate on this, but I don't think it will take too long to get it to compile. Passing doctests will be another issue.
  • Massive progress toward a native MSVC port: This one will be the big question mark. There is certainly a lot of potential interest, but so far few people have committed to do any of the work. There is also no funding in sight that would help to get the massive undertaking on the way. But I still maintain that this is certainly doable.
  • Sage will be part of distributions: It looks like Pardus might be the first to integrate Sage, but now it looks like Debian might still beat it. There was a lot if activity today in the Google group debian-sage, so I am bullish on that, especially since I will be part of the effort.
So, I will be curious to look back toward the end of 2008 to see what did happen and where I was totally off.

Cheers,

Michael

Wednesday, January 2, 2008

Hunting Memory Leaks in the NTL Wrapper

Sage's NTL wrapper is more or less famous for leaking memory. During Bug Day 5 at the Clay in Boston I actually did find the last known leak in a live session. The leak was caused by a str() method. As it turns out that mistake had to be corrected 3 or 4 times in the NTL wrapper and a couple times outside of it. Since then the NTL wrapper has been rewritten to be faster and more feature complete, but during the rewrite new memory leaks have been introduced. While I found those before the merge they seemed minor and insignificant. Unfortunately those leaks turned out to cause massive problems in the padics code.

I do occasionally run the whole Sage testsuite under valgrind's memcheck and it takes about a day to finish when skipping memcheck on all external executables. In the 2.9.0->2.9.1 release cycle I fixed 3 memory leaks, one of them in the graph code when converting arbitrary precision numbers to string representation in base 2. Since that function is called quite often when reading or writing adjacency matrices it can be quite a pain. The other two were relatively minor.

But the reason to run memcheck on the test suite was to find and fix all known memory leak issues before the 2.9.2 release. But I have been unable to fix two issues:
  • LinBox leaks via Givaro in certain situations, for example when computing characteristic polynomials over certain fields. Clement Pernet and I did investigate the issue during Sage Days 6, but so far we haven't found a solution. Since it is relatively minor, especially compared to the memory leaks we already fixed in LinBox it has been somewhat on the back burner.
  • The padic doctests leak tremendous amounts of memory. One doctest leaks more than 50 MBytes of memory. That is clearly unacceptable and would make long term computation involving padics nearly impossible. But so far I have been unable to figure out why the code, especially the __pow__ methods are leaking. I attacked the problem with memcheck as well as omega and omega points to some auto-generated Cython code, which is always a bad sign. I guess I need to bother Robert Bradshaw about this :)
So, after having attacked the problem repeatedly with zero progress I have given up on the 2.9.2 release. Maybe with a little distance I will finally figure out what is wrong.

Oh well, some times it just takes time to figure it all out. Then one has to wonder why it took so long to see the obvious.

Cheers,

Michael