Releases are numbered X.Y.Z. An odd value for Y means that the release is a development version, may be unstable, and may change from day to day. Stable releases have even numbers for Y. Typically, only source code tar files are available for development releases. We intend to provide Windows, OS X, and Linux RPM installers for stable releases, in addition to the source code tar file.
1.7.9 · 12 December 2009
- sources, gzipped tar file, 7.0 MB
(md5: d744970e0b0a39343e4710e590f3cadb)
- OS X 10.5 (Leopard) Intel Installer, 64 MB
(md5: bbf152a886efacfbb9f31c4b9eabe9cb)
- Changes from 1.7.8:
- The Lucene `SnowballAnalyzer` can now be used to add stemming to
the indexing and search, for some languages (those with Snowball
parsers included in PyLucene). This is enabled by setting the
configuration property `use-snowball-stemmer-with-this-language`
to the name of the language, e.g. "English".
- More of the Lucene functionality is now brought out through
the `uplib.indexing.LuceneContext` class. In particular, the
method `term_frequencies` lets one examine the frequencies in
the repository of a set of terms; the method `idf_factors`
gives the Lucene idf factors for a set of terms (a measure of
rarity); and the `search` method now takes a keyword argument
`explain`, which when specified as `True` will cause an explanation
thunk (a zero-parameter function) to be returned as the third
element of each hit -- this thunk, when called, returns the
string explanation generated by Lucene of why that hit is scored
the way it is.
- Added extension with ripper to automatically extract embedded
metadata in Washington Post articles, similar to existing New
York Times extension.
- Added RSSReader extension to automatically pull RSS feeds into
a repository. Define a whitespace-separated list of RSS feeds in
.uplibrc as the value of the config option "rss-sites";
define how often, in seconds, to scan with the config option "rss-scan-period".
- Fixed some bugs with use of subclasses of the Collection class.
- A number of quoting issues with the Web-based metadata editor
were fixed. In particular, titles with quotation marks in them
should now survive the round trip.
- Various Javascript issues with the email thread Web display
have been fixed. Search results that include emails will provide
the option to look at the threads included in the results.
- The "wordbboxes" format has been revised to a version 2; this
version includes baseline information for the words, along with
coarse-grained (every 90 degrees) rotation information. A
couple of bugs in the xpdf patch, and in the figureimages
program, have been fixed. All of this provides for more
accurate page layout analysis.
1.7.8 · 8 October 2009
- sources, gzipped tar file, 7.1 MB
- OS X 10.5 (Leopard) Intel Installer, 63 MB
- Changes from 1.7.7:
- Updated for use with Python 2.6 and tested on Mac OS X 10.6.
- Fixed a bug in the video parser to allow use with WMV files.
Previously, the "Music" parser grabbed those before "Video".
Now, the "Video" parser sorts before "Music", and you can also
force the use of "Video" directly with the "--format=" option to
"uplib-add-document". In addition, the video parser now gathers
by default 5 frames from the video; this may be adjusted with
a user option, "number-of-video-sample-frames".
- Now indexes email recipients, as "email-recipients" (the email
addresses), and "email-recipient-names" (the names, if any).
And "email-recipient-names" is now on the default list of headers
to search for on a query.
- The new command-line program "uplib-topdf" can be used to start
the ToPDF server on UNIX systems -- note that on a Mac, there's
an application, "/Application/Utilities/UpLibToPDF", to do this,
and the Mac installer will install that application as a user
agent by default.
- Epydoc-generated API documentation is now included and installed automatically (it's available from the little "(i)" symbol at the top toolbar of the Web view of the repository). This makes the tar file a bit bigger.
1.7.7 · 18 August 2009
- sources, gzipped tar file, 5.7 MB
- OS X 10.5 (Leopard) Intel Installer, 61 MB
- Changes from 1.7.6:
- New code for caching Web sites. Properly caches recursive CSS pages.
- New utility, "uplib-cache-url", to invoke the URL caching code directly outside of UpLib.
- The query parser syntax is slightly expanded to handle more expressions of date ranges. The keywords "TODAY" and "YESTERDAY" existed previously, so you could look for things you'd added yesterday with the query, "uplibdate:yesterday". "NOW", "PASTWEEK", "PASTMONTH", and "PASTYEAR" have been added, so that you can look for things added in the last seven days with "uplibdate:pastweek", or use NOW as a range value; e.g., "date:[1/1/2009 TO NOW]" will find all docs published from 1/1/2009 to the present.
- Guardian Angels on OS X are now run as system launchd daemons, which means that `uplib-make-repository` now requires an admin password on OS X, to create the plist and load it into launchd's list of daemons. This is done to ensure that the bootstrap context for the Angel does not expire when the user logs out. The launchd system will always keep the daemon running, even when it's "stopped"; now, "stopped" means that the angel is running, but it's doing nothing, just waiting to be restarted.
- The interface for the standard Web function "doc_versions" has been re-worked, so that you can use it to invoke collection functions on all versions of a document.
- Support for using "wkpdf" (on OS X) or "wkhtmltopdf" (on Linux or Windows) for rendering Web pages added. This is basically the same WebKit rendering engine Safari and Google Chromium use for printing Web pages.
- Added new server, ToPDF, which supports running the wkpdf/wkhtmltopdf and OpenOffice format converters in the user's GUI environment, which is required under OS X (and possibly X11 with certain security configurations). (On OS X, this runs as a LaunchAgent in both the Aqua and LoginWindow contexts. See the user manual for more information.)
- Support for JPEG 2000 added via jasper.
- More documentation on OS X usage.
1.7.6 · 25 March 2009
- sources, gzipped tar file, 5 MB
- OS X 10.5 (Leopard) Intel Installer, 57 MB
- Changes from 1.7.5:
- Support for video incorporation using AVbin and pyglet is added.
- The music parser for MP3 files wasn't being loaded, so it couldn't be used for importing MP3s. Now loaded automatically. Still doesn't support MP3 files without ID3 tagging in them.
- Email now handles "text/plain; format=flowed" properly. Various bug-fixes in IMAP protocol handler. Support for Microsoft TNEF attachments added.
- Annotations updated to deal with span anchors, rect anchors, and paragraph anchors.
- Made OpenOffice conversion of Web pages and Office docs more robust.
- Support for client-side keys is now present, in both the Java and Python clients. The UpLib guardian angel can be set to to only accept connections if the client presents a validated key.
- ReportLab use upgraded to 2.x.
- The ReadUp document reading app can now be launched via JNLP. This is a new standard function, /action/basic/doc_readup?doc_id=DOCID
- Added preliminary parser support for text/calendar. Currently, only VEVENT instances are processed.
- Some older debugging comments are made less frequent.
- Support for search abbreviations fixed.
1.7.5 · 18 June 2008
- Sources -- gzipped tar file
- Pre-packaged installers
- Mac OS X
- Windows
- Windows XP installer -- it requires that your machine have diruse.exe and Java pre-installed. It runs a number of sub-installers; just go along with them and click through them till you finish the process. If you have OpenSSL or ABXPDF installed on your machine already, please go to "Add/Remove" programs and remove them before running the UpLib installer.
- Linux
- Changes from 1.7.4:
- New doc-function "Categorize" added. This, if using JCC
PyLucene, will suggest tags based on past tagging history, and
give you a tag cloud of all tags used in the repository so far.
- Various fixes to indexing, particularly annotation indexing,
which was omitting some text. Also restored page search, which
had been broken on 1.7.4 due to a bug.