Friday, December 16, 2011

RPM Changelogs for Recent Updates 10x Faster

In my previous post I presented a generalised rpm changelog summary script. I've now tidied up the implementation and added a couple of new options.

One thing bugging me was that the script exec'ed rpm for each package. Even though UNIX process creation is relatively inexpensive, the programs being exec'ed take time to initialise themselves, they have to open files, read configs, create internal structures, etc. The cumulative initialisation costs can be substantial. For example, the old makewhatis script that used to ship which many Linux distro's exec'ed gawk for every manual page, this took 30 minutes on a 486DX66. It was so annoying I rewrote it to exec gawk less often, and the the run time dropped to 1.5 minutes. The improved version is still included man-1.6g. Given how many machines were once running this script, the reduction in Carbon emissions may have been significant ;-)

By taking advantage of rpm's --queryformat option I've changed the rpmChangelogs script to exec rpm for 100 rpm arguments at a time.  This is about 10 times faster for large runs.  For example, when I generated a summary dating back to my upgrade from OpenSUSE 11.4 to 12.1, the run time reduced from about 50 seconds down to 5 seconds.

I've added an option to include the description of the package. And I've added and option to accept the rpm names from the command line instead of just doing the most recently installed ones.

Here is the syntax summary for the new version:

python rpmChangelogs.py -h
Usage: rpmChangelogs.py [options] [rpm...] 

Report change log entries for recently installed (-i) rpm's or for the rpm's
specified on the command line.

Options:
  -h, --help            show this help message and exit
  -i INSTALLDAYS, --installed-since=INSTALLDAYS
                        Include anything installed up to INSTALLDAYS days ago.
  -c CHANGEDAYS, --changed-since=CHANGEDAYS
                        Report change log entries from up to CHANGEDAYS days
                        ago.
  -d, --description     Include each rpm's description in the output.

Except for the optional addition of the description, the output is the same as the previous OpenSUSE only script.
My python is a little rusty - I just spent months doing Java - so I've also gone back over it and tried to tidy up the code.

The code


(Once you've expanded the code, hover over the code area to bring up options that make it easier to copy or print - requires javascript to be enabled.)
#!/usr/bin/env python
#
# rpmChangelogs.py 
#
# Copyright (C) 2011: Michael Hamilton
# The code is GPL 3.0(GNU General Public License) ( http://www.gnu.org/copyleft/gpl.html )
# 
# Updated 2013/03/18: now uses seconds from 1970 to avoid localisation issues with the dates output by rpm.
#
import subprocess
from datetime import datetime,  timedelta
from optparse import OptionParser

maxArgsPerCommand=100

optParser = OptionParser(
            usage='Usage: %prog [options] [rpm...] ', 
            description="Report change log entries for recently installed (-i) rpm's or for the rpm's specified on the command line.")
optParser.add_option('-i',  '--installed-since',  dest='INSTALLDAYS', type='int', default=1,  help='Include anything installed up to INSTALLDAYS days ago.')
optParser.add_option('-c',  '--changed-since',  dest='CHANGEDAYS', type='int', default=60,  help='Report change log entries from up to CHANGEDAYS days ago.')
optParser.add_option('-d',  '--description',  dest='DESC', action='store_true', default=False,  help="Include each rpm's description in the output.")
(options, args) = optParser.parse_args()

installedSince = datetime.now() - timedelta(days=options.INSTALLDAYS)
changedSince = datetime.now() - timedelta(days=options.CHANGEDAYS)
showDesc = options.DESC

if len(args) > 0:
    recentPackages = args
else:
    queryProcess = subprocess.Popen(['rpm', '-q', '-a', '--last'], shell=False, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
    recentPackages = []
    for queryLine in queryProcess.stdout:
        (name, dateStr) = queryLine.split(' ', 1)
        installDatetime = datetime.strptime(dateStr.strip(), '%a %d %b %Y %H:%M:%S %Z')
        if installDatetime < installedSince:
            break
        recentPackages.append(name)
    queryProcess.stdout.close()
    queryProcess.wait()
    if queryProcess.returncode != 0:
        print '*** ERROR (return code was ', queryProcess.returncode,  ')'
    for line in queryProcess.stderr:
        print line, 

# Use one rpm exec to query multiple packages - 10x faster than an exec for each one
marker = '+Package: '
markerLen = len(marker)
for subset in [recentPackages[i:i+maxArgsPerCommand] for i in range(0, len(recentPackages), maxArgsPerCommand)]:
    format = marker + '%{INSTALLTIME} %{NAME}-%{VERSION}-%{RELEASE}\n' + ('%{DESCRIPTION}\n\n+Changelog:\n' if showDesc else '')
    rpmProcess = subprocess.Popen(['rpm', '-q', '--queryformat=' + format, '--changelog'] + subset, shell=False, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
    tooOld = False
    for line in rpmProcess.stdout:
        if line.startswith(marker):
            installedDate = datetime.fromtimestamp(float(line[markerLen:line.rfind(' ')]))
            name = line.rsplit(' ',  1)[1]
            print '=================================================='
            print marker,  installedDate, name, 
            print '------------------------------'
            tooOld = False
        else:
            if line.startswith('* ') and len(line) > 17:
                try:
                    changeDate = datetime.strptime(line[:line.rfind(' ')], '* %a %b %d %Y')
                    tooOld = changeDate < changedSince
                except ValueError:
                    pass # not a date - move on
            if not tooOld: 
                print line, 
    rpmProcess.stdout.close()
    rpmProcess.wait()
    if rpmProcess.returncode != 0:
        print '*** ERROR (return code was ', rpmProcess.returncode,  ')'
    for line in rpmProcess.stderr:
        print line, 
    rpmProcess.stderr.close()

Thursday, December 15, 2011

RPM Changelogs for Recent Updates

Note, in a more recent post I've sped up this code ten times.
In my previous post I showed you a script that could report recent changelogs for OpenSUSE packages. Overnight I realised I could generalise this to all RPM based distros. Here is a new generalised version:
% python rpmChangeLogs.py -h
Usage: rpmChangeLogs.py [options]

Report change log entries for recent rpm installs.

Options:
  -h, --help            show this help message and exit
  -i INSTALLDAYS, --installed-since=INSTALLDAYS
                        Include anything installed up to INSTALLDAYS days ago.
  -c CHANGEDAYS, --changedSince=CHANGEDAYS
                        Report change log entries from up to CHANGEDAYS days
                        ago.


The output is the same as the previous OpenSUSE only script.

I've also cleaned up the code around python sub-processes.

The code


#!/usr/bin/env python
#
# rpmChangelogs.py 
#
# Copyright (C) 2011: Michael Hamilton
# The code is GPL 3.0(GNU General Public License) ( http://www.gnu.org/copyleft/gpl.html )
#
import subprocess
from datetime import date, datetime,  timedelta
from optparse import OptionParser

optParser = OptionParser(description='Report change log entries for recent rpm installs.')
optParser.add_option('-i',  '--installed-since',  dest='INSTALLDAYS', type='int', default=1,  help='Include anything installed up to INSTALLDAYS days ago.')
optParser.add_option('-c',  '--changedSince',  dest='CHANGEDAYS', type='int', default=60,  help='Report change log entries from up to CHANGEDAYS days ago.')
(options, args) = optParser.parse_args()

installedSince = datetime.now() - timedelta(days=options.INSTALLDAYS)
changedSince = datetime.now() - timedelta(days=options.CHANGEDAYS)

queryProcess = subprocess.Popen(['rpm', '-q', '-a', '--last'], shell=False, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
for queryLine in queryProcess.stdout:
    historyRec = str.split(queryLine, ' ', 1)
    installDatetime = datetime.strptime(str.strip(historyRec[1])[4:24], '%d %b %Y %H:%M:%S')
    if installDatetime < installedSince:
        break
    packageName = historyRec[0]
    print '=================================================='
    print '+Package: ',  installDatetime, packageName
    print '------------------------------'
    rpmProcess = subprocess.Popen(['rpm', '-q', '--changelog',  packageName], shell=False, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
    for line in rpmProcess.stdout:
        try:
            if line[0] == '*' and line[1] == ' ' and len(line) > 17:
                changeDate = datetime.strptime(line[6:17], '%b %d %Y')
                if changeDate < changedSince:
                    break
        except ValueError:
            pass # not a date - move on
        print line, 
    rpmProcess.stdout.close()
    rpmProcess.wait()
    if rpmProcess.returncode != 0:
        print '*** ERROR (return code was ', rpmProcess.returncode,  ')'
    for line in rpmProcess.stderr:
        print line, 
    rpmProcess.stderr.close()
queryProcess.stdout.close()
queryProcess.wait()
if queryProcess.returncode != 0:
    print '*** ERROR (return code was ', queryProcess.returncode,  ')'
for line in queryProcess.stderr:
   print line, 

Wednesday, December 14, 2011

OpenSUSE Changelogs for Recent Updates

Here is a short python script that shows recent portions of the changelogs for recently installed packages. This script is indented for extracting a summary of what has changed after updating my OS to latest packages. Usage is as follows:
% python zyppHist.py -h
Usage: zyppHist.py [options]

Report change log entries for recent installs (zypper/rpm).

Options:
  -h, --help            show this help message and exit
  -i INSTALLDAYS, --installed-since=INSTALLDAYS
                        Include anything installed up to INSTALLDAYS days ago.
  -c CHANGEDAYS, --changedSince=CHANGEDAYS
                        Report change log entries from up to CHANGEDAYS days
                        ago.


Sample output:
python zyppHist.py -i 1 -c 30
==================================================
+Package:  2011-12-14 21:11:12 glibc
------------------------------
* Wed Nov 30 2011 aj@suse.de
- Do not install INSTALL file.

* Wed Nov 30 2011 rcoe@wi.rr.com
- fix printf with many args and printf arg specifiers (bnc#733140)

* Fri Nov 25 2011 aj@suse.de
- Updated glibc-ports-2.14.1.tar.bz2 from ftp.gnu.org.

* Fri Nov 25 2011 aj@suse.com
- Create glibc-devel-static baselibs (bnc#732349).

* Fri Nov 18 2011 aj@suse.de
- Remove duplicated locales from glibc-2.3.locales.diff.bz2

==================================================
+Package:  2011-12-14 21:11:21 splashy
------------------------------
* Thu Dec 08 2011 hmacht@suse.de
- update artwork for openSUSE 12.1 (bnc#730050)

==================================================
+Package:  2011-12-14 21:11:23 libqt4
------------------------------
* Wed Nov 23 2011 llunak@suse.com
- do not assert on QPixmap usage in non-GUI threads
  if XInitThreads() has been called (bnc#731455)

==================================================
+Package:  2011-12-14 21:11:23 libcolord1
------------------------------
* Wed Dec 07 2011 vuntz@opensuse.org
- Update to version 0.1.15:
  + This release fixes an important security bug: CVE-2011-4349.
  + New Features:
  - Add a native driver for the Hughski ColorHug hardware
  - Export cd-math as three projects are now using it
  + Bugfixes:
  - Documentation fixes and improvements
  - Do not crash the daemon if adding the device to the db failed
  - Do not match any sensor device with a kernel driver
  - Don't be obscure when the user passes a device-id to colormgr
  - Fix a memory leak when getting properties from a device
  - Fix colormgr device-get-default-profile

...

The script produces a summary by extracting a list of recent installs from /var/log/zypp/history. For each recent install the script issues an rpm changelog query to obtain each package's changelog. Each changelog is written out line by line and the output is truncated when a date is encountered that is more than the specified number of days in the past.

The code


#!/usr/bin/env python
#
# zypphist.py 
#
# Copyright (C) 2011: Michael Hamilton
# The code is GPL 3.0(GNU General Public License) ( http://www.gnu.org/copyleft/gpl.html )
#
import csv
import subprocess
from datetime import date, datetime,  timedelta
from optparse import OptionParser

zyppHistFilename = '/var/log/zypp/history'

optParser = OptionParser(description='Report change log entries for recent installs (zypper/rpm).')
optParser.add_option('-i',  '--installed-since',  dest='INSTALLDAYS', type='int', default=1,  help='Include anything installed up to INSTALLDAYS days ago.')
optParser.add_option('-c',  '--changedSince',  dest='CHANGEDAYS', type='int', default=60,  help='Report change log entries from up to CHANGEDAYS days ago.')
(options, args) = optParser.parse_args()

installedSince = datetime.now() - timedelta(days=options.INSTALLDAYS)
changedSince = datetime.now() - timedelta(days=options.CHANGEDAYS)

zyppHistReader = csv.reader(open(zyppHistFilename, 'rb'), delimiter='|')
for historyRec in zyppHistReader:
    if historyRec[0][0] != '#' and historyRec[1] == 'install':
        installDate = datetime.strptime(historyRec[0], '%Y-%m-%d %H:%M:%S')
        if installDate >= installedSince:
            packageName = historyRec[2]
            print '=================================================='
            print '+Package: ',  installDate, packageName
            print '------------------------------'
            rpmProcess = subprocess.Popen(['rpm', '-q', '--changelog',  packageName], shell=False, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
            rpmProcess.wait()
            if rpmProcess.returncode != 0:
                print '*** ERROR (return code was ', rpmProcess.returncode,  ')'
            for line in rpmProcess.stderr:
                print line, 
            for line in rpmProcess.stdout:
                try:
                    if line[0] == '*' and line[1] == ' ' and len(line) > 17:
                        changeDate = datetime.strptime(line[6:17], '%b %d %Y')
                        if changeDate < changedSince:
                            break
                except ValueError:
                    pass # not a date - move on
                print line, 
            rpmProcess.stdout.close()
            rpmProcess.stderr.close()