tag:blogger.com,1999:blog-70082651278310574842023-05-04T00:58:02.443-07:00Code includedCaution: may contain code.digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.comBlogger14125tag:blogger.com,1999:blog-7008265127831057484.post-35849282551976735342013-12-08T15:19:00.000-08:002013-12-17T18:08:20.774-08:00FlickrSync - Backup Flickr Images and Data<blockquote class="tr_bq" style="text-align: left;">
<span style="background-color: #cccccc;">I've since discovered that there is already FlickrSync in the Windows world, I will have to come up with a new name at some point. </span></blockquote>
<br />
I've been using flickr for a while (see <a href="http://www.flickr.com/photos/digitaltrails/">digitaltrails</a>). Last year became concerned about Yahoo's commitment to flickr and began investigating how to routinely backup all of my flickr images and metadata. I also wanted to be able to create a slideshow that I could carry on a usb-stick, phone, tablet, or CDROM. I found Colm MacCarthaigh's <a href="https://github.com/dan/hivelogic-flickrtouchr">flicktouchr</a>, a python script that backs up all of a user's images (I think he was targeting an Ipod-Touch - hence the name). Colm's script just gets the images, I wanted some of the metadata as well.<br />
<br />
The flickr REST API used by flickrtouchr is very well documented at <a href="http://www.flickr.com/services/api/">http://www.flickr.com/services/api/</a>. Initially I thought I'd just add some code to Flickrtouchr to make some additional API calls. But as part of a playing around with the script I wound up changing it beyond recognition. In the end I created a new script called flickrsync - flickr-sync - it has the following functionality (all new features unless noted):<br />
<ul>
<li>Uses authentication/authorisation/signed-messages to interact with the flickr API using REST (from flickrtouchr).</li>
<li>Download all images and favorites (from flickrtouchr).</li>
<li>Download videos and determine video type from http response header.</li>
<li>Download recent updates only - n days or since last update. This allows the script to update a previous download rather than grabbing everything all over again.</li>
<li>Download specific sets only (specified by flickr numeric ID).</li>
<li>Download photo and comment metadata (default, but optional).</li>
<li>Download set and collections metadata.</li>
<li>Optionally use copy instead of UNIX links for images that appear in more than one set (flickrtouchr always used UNIX links). Automatically uses copy on non-UNIX systems.</li>
<li>Optionally download favourites (this was non-optional in flicktouchr).</li>
</ul>
Making a REST call to flickr API is simply a process of making an http request and decoding an XML response. This is made slightly more complex by the authentication, authorisation, and signing of requests.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-VYDlTFEsfXE/UqT9br8cMAI/AAAAAAAABOU/XJlobR41NsI/s1600/codeinc-rig-eg.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="406" src="http://4.bp.blogspot.com/-VYDlTFEsfXE/UqT9br8cMAI/AAAAAAAABOU/XJlobR41NsI/s640/codeinc-rig-eg.jpeg" width="640" /></a></div>
<br />
Having achieved a way of routinely mirroring my data, I moved on to the task of creating a standalone slideshow. I found <a href="http://tympanus.net/codrops/author/crnacura/" target="_blank">Mary Lou's</a> <a href="http://tympanus.net/codrops/2011/09/20/responsive-image-gallery/" target="_blank"><i>Responsive Image Gallery with Thumbnail Carousel</i></a> and created a python script to build a slideshow by processing a modified templated version of the Gallery's HTML file. Mary Lou's JavaScript can be easily extended to include automated slideshows with transitions.<br />
<br />
I've recently had some time to tidy up my scripts and I've now made them available at <a href="https://github.com/digitaltrails/FlickrSync" target="_blank">https://github.com/digitaltrails/FlickrSync</a>. The <a href="https://github.com/digitaltrails/FlickrSync/blob/master/README.md" target="_blank">README.md </a>file contains full usage details and an example.digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com0tag:blogger.com,1999:blog-7008265127831057484.post-16571961552876608822013-11-26T14:33:00.002-08:002014-01-12T16:18:38.841-08:00PiKam - Kivy Rasberry Pi camera interface.<div style="text-align: center;">
<span style="background-color: #fce5cd;">Update 2014/01/12 - now with limited live view capability.<span style="color: #ffe599;"> </span></span></div>
<br />
Who could resist a RaspberryPi, certainly not me. But what to do with it? I still have no real firm ideas, but I thought I'd better do something. I used the Kivy cross platform framework and python-twisted networking framework to create a remote interface to the RaspberryPi Camera Module. Nothing fancy, no live view, but I'm able to take pictures from from Linux desktop, my Android phone, and any other platform supported by Kivy+twisted. Here's a screenshot where I used my Android phone to remotely command my RaspberryPi to take a selfie of the phone:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://raw.github.com/digitaltrails/piKam/master/screenshot.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://raw.github.com/digitaltrails/piKam/master/screenshot.jpg" /></a></div>
A few more details can be be found at the <a href="https://github.com/digitaltrails/piKam/wiki">PiKam github wiki</a>. You can browse the code directly on <a href="https://github.com/digitaltrails/piKam">github</a>.<br />
<br />
It was interesting to try out Kivy. It's very easy to get started, going a bit deeper can sometimes exhaust what is in the documentation - it's best to then grep through the examples. You can do a lot quite quickly, but there are definitely limitations - for example, I don't think a PiKam live view via a video stream would be possible. If you just want to do simple cross platform mobile app it's worth a look.<br />
<br />digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com2tag:blogger.com,1999:blog-7008265127831057484.post-5020439291620356092013-09-20T16:16:00.001-07:002014-01-12T12:31:19.383-08:00Raspberry Pi Camera Case<div class="" style="clear: both; text-align: left;">
<a href="http://1.bp.blogspot.com/-i20NPriAtIQ/UjzM1bh0mDI/AAAAAAAABNo/f32Keo4O2-U/s1600/P9210199_v1.JPG" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="http://1.bp.blogspot.com/-i20NPriAtIQ/UjzM1bh0mDI/AAAAAAAABNo/f32Keo4O2-U/s320/P9210199_v1.JPG" height="320" width="240" /></a>Midweek my Raspberry Pi Camera Module turned up. The camera came with big scary warnings about protecting the camera module from static. I was in a hurry to test it out but wanted to keep my fingers off it as much as possible.
<br />
<br />
Looking around for something that could protect the camera module, I spotted an old SDCard case. It only required a little trimming to accommodate the PCB, add a slot for the ribbon-cable, and a square hole for the lens/camera unit. I was in a hurry - it isn't my tidiest work - I've subsequently added a bit of electrical tape to hide the rather hacked hole I made for the lens.
<br />
<br />
The case is quite airtight - cooling might be an issue. When summer comes I suppose I could drill an array of small holes.
The nice thing about this case is I can imagine using Velcro, rubber bands, etc, to easily position it for any task.
</div>
<div class="" style="clear: both; text-align: left;">
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-W74-m7QFlIs/UjzWaUM2eII/AAAAAAAABOA/_UonKrYUWoA/s1600/P9210203_v1.JPG" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="http://2.bp.blogspot.com/-W74-m7QFlIs/UjzWaUM2eII/AAAAAAAABOA/_UonKrYUWoA/s320/P9210203_v1.JPG" height="240" width="320" /></a><a href="http://4.bp.blogspot.com/-n2DU6VyBBtE/UjzM1-LWabI/AAAAAAAABNs/5YDJ0V95a3g/s1600/P9210200_v1.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-n2DU6VyBBtE/UjzM1-LWabI/AAAAAAAABNs/5YDJ0V95a3g/s200/P9210200_v1.JPG" height="150" width="200" /></a></div>
digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com0tag:blogger.com,1999:blog-7008265127831057484.post-91360063705643966452012-08-28T15:26:00.000-07:002012-08-29T01:10:15.164-07:00jQuery/Raphael Virtual Card PunchI've been brushing up on javascript and jQuery. I'm not really a web developer. These days I mostly work in the Java/Linux space deep inside the server, but sometimes a little web development is in the mix. I was working on a future post about backing up Flickr images and data to a static HTML slideshow. While setting up the slideshow, it occurred to me that I could use jQuery to create a Virtual Card Punch that would run entirely in the browser. So back to punch cards one final time.
<br /><br />
<div style="box-shadow 0 0 5px 2px #999;-moz-box-shadow: 0 0 5px 2px #999;-webkit-box-shadow: 0 0 5px 2px #999; border: 2px outset #ddd; display:inline-block; text-align:center;">
<a href="http://www.blogger.com/blogger.g?blogID=7008265127831057484#carddiv" id="cardButton" title=""><b> Punch a card now! </b></a>
</div>
<br /><br />A minimal version of the page is available <a href="http://users.actrix.co.nz/michael/cardpunch.html">here</a> for anyone who wishes to crawl around the source and find out exactly how it was coded.
<br />
<div style="border: 2px solid; padding: 10px; margin: 10px; background-color: #eee">
Programming using a card-punch was a noisy affair, you can hear what it was like at <a href="http://ibm-1401.info/Movies-n-Sounds.html">http://ibm-1401.info</a>, listen to <a href="http://ibm-1401.info/IBM026KeyPunch.mp3">http://ibm-1401.info/IBM026KeyPunch.mp3.</a> <p/>The card-punch could also be programmed to do things like duplicate a deck. This was achieved by punching instructions onto a card and installing the card on the card-punch's program-cylinder. The cylinder was installed in the card-punch and as it turned little cogs engaged with the holes and bumped little leavers signalling instructions to the card-punch.
Duplicating a deck took the noise to a whole new level.</div>
<br />
The following javascript libraries were used:
<br />
<ul>
<li><a href="http://raphaeljs.com/" target="_blank">Raphael</a> - SVG (Scalable Vector Graphics) javascript library.</li>
<li><a href="http://jquery.com/" target="_blank">jQuery</a> - the write less, do more, javascript library.</li>
<li><a href="http://fancybox.net/" target="_blank">Fancybox</a> - floating lightbox.</li>
</ul>
The jQuery library grabs and edits the input text as it is typed. The Raphael library dynamically adds SVG elements to the page. The Fancybox library creates a popup window large enough to produce a scannable image of a card. I've run the code in chrome, firefox and IE8. IE8 seems a little buggy, but it's OK once you start typing.
I used Raphael and SVG to learn a bit about something new. For true portability it would probably be better off using a table of precomputed images, one per character - that would also be very easy to code - jQuery could be used to dynamically update the visible images. Yet another approach can be seen at <a href="http://www.kloth.net/services/cardpunch.php">www.kloth.net</a> - that site uses an http server to generate jpg's or png's for a wide variety of card encodings.
<br />
<div style="display: none;">
<div id="carddiv" style="height: 650px; overflow: hidden; width: 1200px;">
<form>
<tt>&-0123456789ABCDEFGHIJKLMNOPQR/STUVWXYZ:#@'="`.<(+|!$*);^~,%_>?</tt><br />
<input id="cardText" size="80" width="80" />
</form>
<svg id="cardsvg" style="height: 550px; width: 1200px;" version="1.1" xmlns="http://www.w3.org/2000/svg">
</svg><br />
<br />
<div style="text-align: right;">
<a href="http://codeincluded.blogspot.com/"><small>http://codeincluded.blogspot.com</small></a></div>
</div>
</div>
<script src="http://users.actrix.co.nz/michael/js/jquery.min.js" type="text/javascript"></script>
<script src="http://users.actrix.co.nz/michael/js/raphael-min.js" type="text/javascript"></script>
<script src="http://users.actrix.co.nz/michael/js/cardpunch.js" type="text/javascript"></script>
<script src="http://users.actrix.co.nz/michael/js/fancybox/lib/jquery.mousewheel-3.0.6.pack.js" type="text/javascript"></script>
<!-- Add fancyBox -->
<link href="http://users.actrix.co.nz/michael/js/fancybox/source/jquery.fancybox.css?v=2.0.6" media="screen" rel="stylesheet" type="text/css"></link>
<script src="http://users.actrix.co.nz/michael/js/fancybox/source/jquery.fancybox.pack.js?v=2.0.6" type="text/javascript"></script>
<script type="text/javascript">
$(document).ready(function() {
$(".fancybox").fancybox({
helpers: {
title : {
type : 'float'
}
}
});
$("#cardButton").fancybox({
'titleShow' : true,
'transitionIn' : 'elastic',
'transitionOut' : 'elastic'
});
cardpunchInit();
});
</script>digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com0tag:blogger.com,1999:blog-7008265127831057484.post-5255937910997652172012-08-01T02:07:00.000-07:002012-09-08T19:54:24.072-07:00Punch Card Reader - the FAQ<br />
<div style="margin-bottom: 0in;">
<b>Can I have a card to scan?</b><br />
You could use a screen grab from my
<a href="http://codeincluded.blogspot.co.nz/2012/08/jqueryrapheal-virtual-card-punch.html">Javascript Virtual Cardpunch</a>. Or you use the following python punchcardgen.py script that generates card images from text read from stdin (how about a t-shirt with a message punched into it):<br />
<pre class="brush: python; collapse:false; wrap-lines:false; ">#!/usr/bin/env python
#
# punchcardgen.py
#
# Copyright (C) 2011: Michael Hamilton
# The code is GPL 3.0(GNU General Public License) ( http://www.gnu.org/copyleft/gpl.html )
#
import Image
import sys
CARD_COLUMNS = 80
CARD_ROWS = 12
# found measurements at http://www.quadibloc.com/comp/cardint.htm
CARD_WIDTH = 7.0 + 3.0/8.0 # Inches
CARD_HEIGHT = 3.25 # Inches
CARD_COL_WIDTH = 0.087 # Inches
CARD_HOLE_WIDTH = 0.055 # Inches IBM, 0.056 Control Data
CARD_ROW_HEIGHT = 0.25 # Inches
CARD_HOLE_HEIGHT = 0.125 # Inches
CARD_TOPBOT_MARGIN = 3.0/16.0 # Inches at top and bottom
CARD_SIDE_MARGIN = 0.2235 # Inches on each side
DARK = (0,0,0)
BRIGHT = (255,255,255) # pixel brightness value (i.e. (R+G+B)/3)
REDUCE_IN_SIZE=8
IBM_MODEL_029_KEYPUNCH = """
/&-0123456789ABCDEFGHIJKLMNOPQR/STUVWXYZ:#@'="`.<(+|!$*);^~,%_>? |
12 / O OOOOOOOOO OOOOOO |
11| O OOOOOOOOO OOOOOO |
0| O OOOOOOOOO OOOOOO |
1| O O O O |
2| O O O O O O O O |
3| O O O O O O O O |
4| O O O O O O O O |
5| O O O O O O O O |
6| O O O O O O O O |
7| O O O O O O O O |
8| O O O O OOOOOOOOOOOOOOOOOOOOOOOO |
9| O O O O |
|__________________________________________________________________|"""
translate = None
if translate == None:
translate = {}
# Turn the ASCII art sideways and build a hash look up for
# column values, for example:
# A:(O, , ,O, , , , , , , , )
# B:(O, , , ,O, , , , , , , )
# C:(O, , , , ,O, , , , , , )
rows = IBM_MODEL_029_KEYPUNCH[1:].split('\n');
rotated = [[ r[i] for r in rows[0:13]] for i in range(5, len(rows[0]) - 1)]
for v in rotated:
translate[v[0]] = tuple(v[1:])
if __name__ == '__main__':
scale = 1000
margin = 200
card_x_pixels = int(CARD_WIDTH * scale)
card_y_pixels = int(CARD_HEIGHT * scale)
img_size = (2 * margin + card_x_pixels, 2 * margin + card_y_pixels)
side_margin_pixels = int(CARD_SIDE_MARGIN * scale)
col_width_pixels = int(CARD_COL_WIDTH * scale)
top_bot_margin = int(CARD_TOPBOT_MARGIN * scale)
row_height_pixels = int(CARD_ROW_HEIGHT * scale)
hole_width = int(CARD_HOLE_WIDTH * scale)
hole_height = int(CARD_HOLE_HEIGHT * scale)
card_area = (margin, margin, margin + card_x_pixels, margin + card_y_pixels)
proto_img = Image.new('RGB', img_size, BRIGHT)
proto_pix = proto_img.load()
proto_img.paste(DARK, card_area)
# Remove the top left corner (don't know the standard for this - guess)
i = 0
for x in xrange(margin, margin + side_margin_pixels):
for y in xrange(margin, margin + top_bot_margin + hole_height - i):
proto_pix[x,y] = BRIGHT
i += 2
card_number = 1
for line in sys.stdin:
img = proto_img.copy()
x = margin + side_margin_pixels
for char in line:
if char in translate:
values = translate[char]
y = margin + top_bot_margin
for row in xrange(0, CARD_ROWS):
if values[row] == 'O':
img.paste(BRIGHT, (x, y, x + hole_width, y + hole_height))
y += row_height_pixels
x += col_width_pixels
if x > margin + card_x_pixels:
break
img = img.resize((img_size[0]/REDUCE_IN_SIZE, img_size[1]/REDUCE_IN_SIZE))
filename = "%010.10d.jpg" % ( card_number )
print filename, line
img.save(filename)
card_number += 1
</pre>
The script has no command line options, just feed it uppercase text, for example:
<br />
<pre class="brush: plain; light:true; highlight:1">
% python punchcardgen.py
PROGRAM FORTRAN; WRITE(*,*)'HELLO WORLD'; END PROGRAM
</pre>
In this case the script produces a single image:
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-AtuQ5HHpCPE/UDvfJCAO0UI/AAAAAAAABDk/-7SIB1dKJus/s1600/0000000001.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="187" src="http://4.bp.blogspot.com/-AtuQ5HHpCPE/UDvfJCAO0UI/AAAAAAAABDk/-7SIB1dKJus/s400/0000000001.jpg" width="400" /></a></div>
The full-sized image can be rescanned to text by using my original punchcard script, for example:
<br />
<pre class="brush: plain; light:true; highlight:1">% python punchcard.py 0000000001.jpg > prog.f90
% gfortran prog.f90
% ./a.out
HELLO WORLD
</pre>
<br />
If you have your own cards, you can just hold them up to an even light and take their picture, for example you might use a monitor displaying white or a cloudy sky - just make sure the resulting image background is smooth and the picture is straight and square - and hold it by the bottom corner, for example:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-5xD0ig3nGfs/UBjs9JaHVZI/AAAAAAAABC0/r7qJPeTIOSU/s1600/P8017365s.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="208" src="http://2.bp.blogspot.com/-5xD0ig3nGfs/UBjs9JaHVZI/AAAAAAAABC0/r7qJPeTIOSU/s400/P8017365s.JPG" width="400" /></a></div>
<br />
To get the best scan, try the python script with the -d or -i options for debug info, use -b N to change the threshold light levels. Use a full sized image - if the image is too small the calculations introduce errors. <br />
<br />
<b>Why
not use an auto-feed scanner with a straight through paper path?</b><br />
<div style="margin-bottom: 0in;">
That would be the way to go if
I<span style="font-family: Arial;"> </span>wanted to spend money on it – and
didn't want to learn a little electronics. </div>
<div style="margin-bottom: 0in;">
<br /></div>
<div style="margin-bottom: 0in;">
<b>Why not add a motor and automate the
feed?</b></div>
<div style="margin-bottom: 0in;">
I was worried about jams on older decks of cards. I think I
could build a better feed by copying my photo printer's paper feed in Lego. (possible patent violation?) </div>
<div style="margin-bottom: 0in;">
<br /></div>
<div style="margin-bottom: 0in;">
I did consider using my
photo-printer as a feeder, that would probably have worked quite well. I would have to figure out how to collect cards as they exit the printer. <br />
<br />
Some way along I figured I could get the job done with stuff at hand without out buying anything. Once I set that constraint, options narrowed considerably and decisions were easier to make.</div>
<div style="margin-bottom: 0in;">
<br /></div>
<div style="margin-bottom: 0in;">
There was also the case of the minicomputer
with a crank fitted over the instruction single-step toggle-switch –
variable speed debugging – consider my approach a homage to that
earlier clever hardware hack.</div>
<div style="margin-bottom: 0in;">
<br /></div>
<div style="margin-bottom: 0in;">
<b>Why not use an array of detectors
connected to the Arduino and eliminate the camera?</b></div>
<div style="margin-bottom: 0in;">
That would be cool but - this is my
first attempt at electronics. Advancing each column past a single
column scan would seem hard to calibrate correctly. I imagine that
could be solved with a grid/wheel of calibration holes moving with
the card or<span style="font-family: Arial;"> </span>moving with the scanner. </div>
<div style="margin-bottom: 0in;">
<br /></div>
<div style="margin-bottom: 0in;">
If
the card moved at a constant speed it might be possible to detect start and end of card, and from the timing figure
out what went past and what row it belonged to. </div>
<div style="margin-bottom: 0in;">
<br /></div>
<div style="margin-bottom: 0in;">
<b>Why not just read the text printed
at the top of the card?</b></div>
<div style="margin-bottom: 0in;">
I didn't think the text would be good
enough for OCR. Some cards were quite worn. I did not want to
manually read and enter each card. OCR seems a tough problem compared
to reading the holes.<br />
<br />
<b>What is the Arduino for exactly?</b><br />
The Arduino stops the card, detects that a card has stopped, signals the camera to focus and shoot, opens the servo to let the card go. It plays a key role in keeping the cards in order, both the order of the images in the camera, and the order of the physical cards in the output bin. It could do more, such as run a feed motor. But really, the Card Reader is a integration with an Arduino in the mix - it's not a pure Arduino project. </div>
<div style="margin-bottom: 0in;">
<br />
<b>Why not use a webcam, Android camera?</b><br />
The Canon S2 IS employed here is old, but produces reasonable distortion free images - with the CHDK firmware hack it seemed a shame not to use it. It would be nice to feed directly to the PC - Android or a webcam would accomplish this. Perhaps a wireless capable SD-card might also work. <br />
<br />
<br />
<div style="margin-bottom: 0in;">
<b>Why would anyone want to go to this much time/effort?</b></div>
<div style="margin-bottom: 0in;">
<span style="background-color: white;">For me, learning</span> by doing works best. This was a well bounded problem that look solvable. It wasn't all that much effort, I just kept the problem in the back of my mind over the last year. There was only the occasional burst of activity when ideas solidified – I was not working to a deadline.</div>
</div>
<div style="margin-bottom: 0in;">
<b><br /></b>
<b>Why not tweak-it/finish-it/enhance-it in some respect?</b></div>
<div style="margin-bottom: 0in;">
I've scanned the cards I wanted to - some MIX, some FORTRAN. In the process achieved my goal of learning a little about Arduino,
electronics, CHDK, fritzing, and PIL. I hope to apply some of
what I've learnt to some of my other
interests, for example, nature photography:<br />
<br />
<div style="text-align: center;">
<a href="http://www.flickr.com/photos/digitaltrails/">http://www.flickr.com/photos/digitaltrails/</a>.</div>
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="margin-bottom: 0in;">
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-0qOuo0K1-so/UBjvkkzY4gI/AAAAAAAABDI/Yr03NZc5tbM/s1600/6794049294_66a4b317a2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="http://4.bp.blogspot.com/-0qOuo0K1-so/UBjvkkzY4gI/AAAAAAAABDI/Yr03NZc5tbM/s320/6794049294_66a4b317a2.jpg" width="320" /></a></div>
<br /></div>
<div style="margin-bottom: 0in;">
<br /></div>
<div style="margin-bottom: 0in;">
<br />
<br /></div>
</div>digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com0tag:blogger.com,1999:blog-7008265127831057484.post-52968030743293172432012-07-26T01:51:00.001-07:002012-08-01T03:36:05.422-07:00Punch Card Reader - The Hardware<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="text-align: start;">
<div class="separator" style="clear: both; text-align: center;">
</div>
<span style="background-color: white;">Having used PIL to create software that could scan a card, I spent quite some time mulling over possible ways to make a scanner. At some point it occurred to me that I could recycle some old curtain rails and use gravity to do the card transport. This lead to a final design that could be built from materials all ready to hand: some old curtain rails, </span><span style="background-color: white;">an old piece of shelving, </span><span style="background-color: white;">tracing paper, a desk lamp, some masking tape, and Blu-Tack. </span><span style="background-color: white;">I made some cutouts in the rails so that the software scanner could identify the side edges of each card and automatically calibrate the card's width (the power drill grabbed as it made it through, so it's best to clamp the rail down securely). I rounded off all cut edges with a file to minimise the chance of a card catching on entering the rails or while passing through the cutout.</span></div>
<div>
<span style="background-color: white;"><br /></span></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-q8btHVi-NcU/UApDkMkuqNI/AAAAAAAABAw/xZkmw22sAUE/s1600/hardware.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="480" src="http://3.bp.blogspot.com/-q8btHVi-NcU/UApDkMkuqNI/AAAAAAAABAw/xZkmw22sAUE/s640/hardware.jpg" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both;">
The breadboard connected to the Arduino was wired up by combining aspects of the designs that came with the <a href="http://www.sparkfun.com/products/10173" target="_blank">SparkFun Inventor's Kit for Arduino</a> (I'm in New Zealand, so I ordered it from <a href="http://www.mindkits.co.nz/">mindkits.co.nz</a>). <span style="background-color: white;"> </span><span style="background-color: white;">I subsequently found the </span><a href="http://fritzing.org/" style="background-color: white;">Fritzing</a><span style="background-color: white;"> circuit design tool which I have now used to document my design (I've not had the courage to dismantle and rewire the breadboard to test the design schematics, so the schematics are untried). </span></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-sVuvVk5owAI/UAI-1rHwcvI/AAAAAAAAA-E/FYoc0RDQyEE/s1600/fritzing-breadboard.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="500" src="http://1.bp.blogspot.com/-sVuvVk5owAI/UAI-1rHwcvI/AAAAAAAAA-E/FYoc0RDQyEE/s640/fritzing-breadboard.jpg" width="640" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-ibwK1bhUh2o/UAI-2itk70I/AAAAAAAAA-M/qNmmuqU98sA/s1600/fritzing-schematic.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="472" src="http://2.bp.blogspot.com/-ibwK1bhUh2o/UAI-2itk70I/AAAAAAAAA-M/qNmmuqU98sA/s640/fritzing-schematic.jpg" width="640" /></a></div>
<span style="background-color: white;"><br /></span><br />
<span style="background-color: white;">The card feed was a initially a bit of a problem. I had originally thought about making the whole reader from Lego, but then thought why torture myself? I also considered photo printers with straight through paper paths, I tried mine and it worked quite well, but I still needed to figure out how to get the card from the printer's output tray into the rails - probably by tipping the printer on an angle to let gravity feed the output onto a shoot (perhaps with relatively empty ink cartridges). In the end I went with a crude manual Lego feeder. It has the advantage that jams can be dealt with before any cards are mangled. It's not perfect, but it's adequate.</span><br />
<span style="background-color: white;"><br /></span><br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/--lJ_yi7C0TU/UApDYtG6MwI/AAAAAAAABAo/Or7kAc1lJYg/s1600/feeder.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="480" src="http://4.bp.blogspot.com/--lJ_yi7C0TU/UApDYtG6MwI/AAAAAAAABAo/Or7kAc1lJYg/s640/feeder.jpg" width="640" /></a></div>
<span style="background-color: white;"><br /></span><br />
<span style="background-color: white;">The camera was fairly easy, I have an old Canon S2 IS, I just put </span><a href="http://chdk.wikia.com/wiki/CHDK" style="background-color: white;">CHDK enhanced firmware</a><span style="background-color: white;"> on an SD card. I cut open an old USB cable to attach the camera to the Arduino. The hardest part of dealing with the camera was experimenting with the timing sequence on the USB shutter release. The documentation is a bit technical, but I eventually settled on using a short half press delay to focus, followed by a full press delay to take the photo. Here is the Arduino-Sketch code for the controller:</span><br />
<pre class="brush: cpp; collapse:true; wrap-lines:false; ">// Shutter Controller
// by Michael Hamilton
// The code is GPL 3.0(GNU General Public License)
#include <servo .h=".h">
int SERVO_PIN = 9;
int SERVO_CLOSE = 120;
int SERVO_OPEN = 15;
int SERVO_DELAY = 250;
int SERVO_CARD_MOVE_DELAY = 1000;
int PHOTO_RESIST_PIN = 0;
int PHOTO_RESIST_DIFF = 40;
int PHOTO_RESIST_DELAY = 1000;
int CAMERA_PIN = 11;
int CAMERA_FULL_PRESS_DELAY = 1000;
int CAMERA_HALF_PRESS_DELAY = 500;
int CAMERA_FOCUS_DELAY = 400;
int CAMERA_CAPTURE_DELAY = 1500;
Servo myservo; // create servo object to control a servo
// a maximum of eight servo objects can be created
int lastlevel = 0;
void setup()
{
Serial.begin(9600);
Serial.println("Card Rdr");
myservo.attach(SERVO_PIN); // attaches the servo on pin 9 to the servo object
myservo.write(SERVO_CLOSE);
delay(SERVO_DELAY);
myservo.detach();
}
void loop()
{
int lightlevel = analogRead(0);
if (lastlevel != lightlevel) {
Serial.print("lightlevel=");
Serial.println(lightlevel);
int diff = lastlevel - lightlevel;
if (diff >= PHOTO_RESIST_DIFF) {
digitalWrite(CAMERA_PIN, HIGH);
delay(CAMERA_HALF_PRESS_DELAY); // press shutter -focus
digitalWrite(CAMERA_PIN, LOW);
delay(CAMERA_FOCUS_DELAY);
digitalWrite(CAMERA_PIN, HIGH);
delay(CAMERA_FULL_PRESS_DELAY); // press shutter
digitalWrite(CAMERA_PIN, LOW);
delay(CAMERA_CAPTURE_DELAY); // wait for photo to be taken
myservo.attach(SERVO_PIN);
myservo.write(SERVO_OPEN);
delay(SERVO_DELAY + SERVO_CARD_MOVE_DELAY); // wait for servo to open and card to move
myservo.write(SERVO_CLOSE);
delay(SERVO_DELAY); // wait for servo to close
myservo.detach();
}
lastlevel = lightlevel;
delay(PHOTO_RESIST_DELAY);
}
}
</servo></pre>digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com6tag:blogger.com,1999:blog-7008265127831057484.post-83057691317867480732012-07-25T18:58:00.000-07:002012-08-26T16:27:14.713-07:00Punch Card Reader - The Software<blockquote>Note: The <a href="http://codeincluded.blogspot.com/2012/08/punch-card-reader-faq.html">FAQ</a> now includes a script that can generate punch card images from text. I guess this new script could be used as a basis for a card maker - or to create punch card t-shirt logos?</blockquote>
When I originally started the Punch Card Reader project my first step was to obtain a few sample images by holding a camera in one hand, and a punch-card up to the window in the other. I then located detailed card specifications at <a href="http://www.quadibloc.com/comp/cardint.htm" style="background-color: white;" target="_blank">http://www.quadibloc.com/comp/cardint.htm</a>, a site that documents all the essential dimensions along with quite a bit of background history. Using these dimensions I was able to experiment with the sample images by using the Python Image Library (PIL - python 2.7). PIL makes it very easy to walk the x/y grid of an image inspecting each pixel's RGB values.<br />
I tried to come up with a heuristic to recognise the card-edges and the punched-holes. Initially I accumulated brightness values across the entire surface and averaged them into discrete rows and columns. This worked reasonably well but was quite slow. I soon realised that recognising the tall horizontal rows required far less precision than the smaller and more numerous vertical columns. I was able to shortcut the vertical scan by just examining a one pixel wide line across the estimated middle of each row. You can get some feel for the tolerances required from the following debug dump:<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-1t9AEE-SbGc/UA-xBEmx6YI/AAAAAAAABBA/sW7bZzUcY0Q/s1600/punchcardAnotatedSmall.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="310" src="http://2.bp.blogspot.com/-1t9AEE-SbGc/UA-xBEmx6YI/AAAAAAAABBA/sW7bZzUcY0Q/s640/punchcardAnotatedSmall.jpg" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
The faint red marks show where the scanning algorithm has decided it has found an edge or a hole. The faint blue rectangles plot where a holes were expected to located - you can see the the vertical drift isn't going to be much of a problem so long as the image is reasonably square and flat. Notice that the red marks at the start of each horizontal row exhibit some drift from the true vertical and the <span style="background-color: white;">script has to compensate for this to maintain an accurate allocation of holes to the correct columns. On the other hand I found it adequate to calibrate the vertical height from one reading only - this is why the guide rails have holes cut in the middle - the middle reading is clearly marked on this image.</span><br />
<br />
The final script accepts some parameters to help it adjust to the characteristics of the scanning hardware and to enable some debugging feedback, here is a summary of the script's parameters output by its help option:<br />
<blockquote class="tr_bq">
<pre class="brush: plain; light:true; highlight:1"> % python punchcard.py --help
Usage: punchcard.py [options] image [image...]
decode punch card image into ASCII.
Options:
-h, --help show this help message and exit
-b BRIGHT, --bright-threshold=BRIGHT
Brightness (R+G+B)/3, e.g. 127.
-s SIDE_MARGIN_RATIO, --side-margin-ratio=SIDE_MARGIN_RATIO
Manually set side margin ratio (sideMargin/cardWidth).
-d, --dump Output an ASCII-art version of the card.
-i, --display-image Display an anotated version of the image.
-r, --dump-raw Output ASCII-art with raw row/column accumulator
values.
-x XSTART, --x-start=XSTART
Start looking for a card edge at y position (pixels)
-X XSTOP, --x-stop=XSTOP
Stop looking for a card edge at y position
-y YSTART, --y-start=YSTART
Start looking for a card edge at y position
-Y YSTOP, --y-stop=YSTOP
Stop looking for a card edge at y position
-a XADJUST, --adjust-x=XADJUST
Adjust middle edge detect location (pixels)
</pre>
</blockquote>
To assist with adjusting the scan for the best results the script can optionally display marked up images (seen above). Plus the script can produce an ASCII art dump, for example:<br />
<pre class="brush: plain; light:true; highlight:1"> SLAX 1 MOVE ALL CHARS ONE LEFT
Card Dump of Image file: mix1/img_1961.jpg Format Dump threshold= 190
123456789-123456789-123456789-123456789-123456789-123456789-123456789-123456789-
________________________________________________________________________________
/ SLAX 1 MOVE ALL CHARS ONE LEFT |
|.............O..................O.O...OOO.....O..OO.............................|
|............O................OO....OO....O..OO..O...............................|
|...........O..O................O..........O........O............................|
|.............O..O.................O.....O.......................................|
|...........O..............................O.....................................|
|............O......................OO.O.........O..O............................|
|.............................O..................................................|
|...............................OO............OO..O..............................|
|..............................O.............O.....O.............................|
|..............O.................................................................|
|.......................................O........................................|
|.........................................O......................................|
`--------------------------------------------------------------------------------'
123456789-123456789-123456789-123456789-123456789-123456789-123456789-123456789-
</pre>
That concludes this brief overview of the recognition script. The next post will describe the hardware in more detail. Full script code follows below.
<br />
<h4>
The code (punchcard.py):</h4>
<pre class="brush: python; collapse:true; wrap-lines:false; ">#!/usr/bin/env python
#
# punchcard.py
#
# Copyright (C) 2011: Michael Hamilton
# The code is GPL 3.0(GNU General Public License) ( http://www.gnu.org/copyleft/gpl.html )
#
import Image
import sys
from optparse import OptionParser
CARD_COLUMNS = 80
CARD_ROWS = 12
# found measurements at http://www.quadibloc.com/comp/cardint.htm
CARD_WIDTH = 7.0 + 3.0/8.0 # Inches
CARD_HEIGHT = 3.25 # Inches
CARD_COL_WIDTH = 0.087 # Inches
CARD_HOLE_WIDTH = 0.055 # Inches IBM, 0.056 Control Data
CARD_ROW_HEIGHT = 0.25 # Inches
CARD_HOLE_HEIGHT = 0.125 # Inches
CARD_TOPBOT_MARGIN = 3.0/16.0 # Inches at top and bottom
CARD_SIDE_MARGIN = 0.2235 # Inches on each side
CARD_SIDE_MARGIN_RATIO = CARD_SIDE_MARGIN/CARD_WIDTH # as proportion of card width (margin/width)
CARD_TOP_MARGIN_RATIO = CARD_TOPBOT_MARGIN/CARD_HEIGHT # as proportion of card height (margin/height)
CARD_ROW_HEIGHT_RATIO = CARD_ROW_HEIGHT/CARD_HEIGHT # as proportion of card height - works
CARD_COL_WIDTH_RATIO = CARD_COL_WIDTH/CARD_WIDTH # as proportion of card height - works
CARD_HOLE_HEIGHT_RATIO = CARD_HOLE_HEIGHT/CARD_HEIGHT # as proportion of card height - works
CARD_HOLE_WIDTH_RATIO = CARD_HOLE_WIDTH/CARD_WIDTH # as a proportion of card width
BRIGHTNESS_THRESHOLD = 200 # pixel brightness value (i.e. (R+G+B)/3)
IBM_MODEL_029_KEYPUNCH = """
/&-0123456789ABCDEFGHIJKLMNOPQR/STUVWXYZ:#@'="`.<(+|!$*);^~,%_>? |
12 / O OOOOOOOOO OOOOOO |
11| O OOOOOOOOO OOOOOO |
0| O OOOOOOOOO OOOOOO |
1| O O O O |
2| O O O O O O O O |
3| O O O O O O O O |
4| O O O O O O O O |
5| O O O O O O O O |
6| O O O O O O O O |
7| O O O O O O O O |
8| O O O O OOOOOOOOOOOOOOOOOOOOOOOO |
9| O O O O |
|__________________________________________________________________|"""
translate = None
if translate == None:
translate = {}
# Turn the ASCII art sideways and build a hash look up for
# column values, for example:
# (O, , ,O, , , , , , , , ):A
# (O, , , ,O, , , , , , , ):B
# (O, , , , ,O, , , , , , ):C
rows = IBM_MODEL_029_KEYPUNCH[1:].split('\n');
rotated = [[ r[i] for r in rows[0:13]] for i in range(5, len(rows[0]) - 1)]
for v in rotated:
translate[tuple(v[1:])] = v[0]
#print translate
# generate a range of floats
def drange(start, stop, step=1.0):
r = start
while (step >= 0.0 and r < stop) or (step < 0.0 and r > stop):
yield r
r += step
# Represents a punchcard image plus scanned data
class PunchCard(object):
def __init__(self, image, bright=-1, debug=False, xstart=0, xstop=0, ystart=0, ystop=0, xadjust=0):
pass
self.text = ''
self.decoded = []
self.surface = []
self.debug = debug
self.threshold = 0
self.ymin = ystart
self.ymax = ystop
self.xmin = xstart
self.xmax = xstop
self.xadjust = xadjust
self.image = image
self.pix = image.load()
self._crop()
self._scan(bright)
# Brightness is the average of RGB values
def _brightness(self, pixel):
#print max(pixel)
return ( pixel[0] + pixel[1] + pixel[2] ) / 3
# For highlighting on the debug dump
def _flip(self, pixel):
return max(pixel)
# The search is started from the "crop" edges.
# Either use crop boundary of the image size or the valyes supplied
# by the command line args
def _crop(self):
self.xsize, self.ysize = image.size
if self.xmax == 0:
self.xmax = self.xsize
if self.ymax == 0:
self.ymax = self.ysize
self.midx = self.xmin + (self.xmax - self.xmin) / 2 + self.xadjust
self.midy = self.ymin + (self.ymax - self.ymin) / 2
# heuristic for finding a reasonable cutoff brightness
def _find_threshold_brightness(self):
left = self._brightness(self.pix[self.xmin, self.midy])
right = self._brightness(self.pix[self.xmax - 1, self.midy])
return min(left, right, BRIGHTNESS_THRESHOLD) - 10
vals = []
last = 0
for x in xrange(self.xmin,self.xmax):
val = self._brightness(self.pix[x, self.midy])
if val > last:
left = val
else:
break
last = val
for x in xrange(self.xmax,self.xmin, -1):
val = self._brightness(self.pix[x, self.midy])
if val > last:
right = val
else:
break
right = val
print left, right
return min(left, right,200)
for x in xrange(self.xmin,self.xmax):
val = self._brightness(self.pix[x, self.midy])
vals.append(val)
vals.sort()
last_val = vals[0]
biggest_diff = 0
threshold = 0
for val in vals:
diff = val - last_val
#print val, diff
if val > 127 and val < 200 and diff >= 5:
biggest_diff = diff
threshold = val
last_val = val
if self.debug:
print "Threshold diff=", biggest_diff, "brightness=", val
return threshold - 10
# Find the left and right edges of the data area at probe_y and from that
# figure out the column and hole vertical dimensions at probe_y.
def _find_data_horiz_dimensions(self, probe_y):
left_border, right_border = self.xmin, self.xmax - 1
for x in xrange(self.xmin, self.midx):
if self._brightness(self.pix[x, probe_y]) < self.threshold:
left_border = x
break
for x in xrange(self.xmax-1, self.midx, -1):
if self._brightness(self.pix[x, probe_y]) < self.threshold:
right_border = x
break
width = right_border - left_border
card_side_margin_width = int(width * CARD_SIDE_MARGIN_RATIO)
data_left_x = left_border + card_side_margin_width
#data_right_x = right_border - card_side_margin_width
data_right_x = data_left_x + int((CARD_COLUMNS * width) * CARD_COL_WIDTH/CARD_WIDTH)
col_width = width * CARD_COL_WIDTH_RATIO
hole_width = width * CARD_HOLE_WIDTH_RATIO
#print col_width
if self.debug:
# mark left and right edges on the copy
for y in xrange(probe_y - self.ysize/100, probe_y + self.ysize/100):
self.debug_pix[left_border if left_border > 0 else 0,y] = 255
self.debug_pix[right_border if right_border < self.xmax else self.xmax - 1,y] = 255
for x in xrange(1, (self.xmax - self.xmin) / 200):
self.debug_pix[left_border + x, probe_y] = 255
self.debug_pix[right_border - x, probe_y] = 255
return data_left_x, data_right_x, col_width, hole_width
# find the top and bottom of the data area and from that the
# column and hole horizontal dimensions
def _find_data_vert_dimensions(self):
top_border, bottom_border = self.ymin, self.ymax
for y in xrange(self.ymin, self.midy):
#print pix[midx, y][0]
if self._brightness(self.pix[self.midx, y]) < self.threshold:
top_border = y
break
for y in xrange(self.ymax - 1, self.midy, -1):
if self._brightness(self.pix[self.midx, y]) < self.threshold:
bottom_border = y
break
card_height = bottom_border - top_border
card_top_margin = int(card_height * CARD_TOP_MARGIN_RATIO)
data_begins = top_border + card_top_margin
hole_height = int(card_height * CARD_HOLE_HEIGHT_RATIO)
data_top_y = data_begins + hole_height / 2
col_height = int(card_height * CARD_ROW_HEIGHT_RATIO)
if self.debug:
# mark up the copy with the edges
for x in xrange(self.xmin, self.xmax-1):
self.debug_pix[x,top_border] = 255
self.debug_pix[x,bottom_border] = 255
if self.debug:
# mark search parameters
for x in xrange(self.midx - self.xsize/20, self.midx + self.xsize/20):
self.debug_pix[x,self.ymin] = 255
self.debug_pix[x,self.ymax - 1] = 255
for y in xrange(0, self.ymin):
self.debug_pix[self.midx,y] = 255
for y in xrange(self.ymax - 1, self.ysize-1):
self.debug_pix[self.midx,y] = 255
return data_top_y, data_top_y + col_height * 11, col_height, hole_height
def _scan(self, bright=-1):
if self.debug:
# if debugging make a copy we can draw on
self.debug_image = self.image.copy()
self.debug_pix = self.debug_image.load()
self.threshold = bright if bright > 0 else self._find_threshold_brightness()
#x_min, x_max, col_width = self._find_data_horiz_dimensions(image, pix, self.threshold, self.ystart, self.ystop)
y_data_pos, y_data_end, col_height, hole_height = self._find_data_vert_dimensions()
data = {}
# Chads are narrow so find then heuristically by accumulating pixel brightness
# along the row. Should be forgiving if the image is slightly wonky.
y = y_data_pos #- col_height/8
for row_num in xrange(CARD_ROWS):
probe_y = y + col_height if row_num == 0 else ( y - col_height if row_num == CARD_ROWS -1 else y ) # Line 0 has a corner missing
x_data_left, x_data_right, col_width, hole_width = self._find_data_horiz_dimensions(probe_y)
left_edge = -1 # of a punch-hole
for x in xrange(x_data_left, x_data_right):
# Chads are tall so we can be sure if we probe around the middle of their height
val = self._brightness(self.pix[x, y])
if val >= self.threshold:
if left_edge == -1:
left_edge = x
if self.debug:
self.debug_pix[x,y] = self._flip(self.pix[x,y])
else:
if left_edge > -1:
hole_length = x - left_edge
if hole_length >= hole_width * 0.75:
col_num = int((left_edge + hole_length / 2.0 - x_data_left) / col_width + 0.25)
data[(col_num, row_num)] = hole_length
left_edge = -1
if (self.debug):
# Plot where holes might be on this row
expected_top_edge = y - hole_height / 2
expected_bottom_edge = y + hole_height / 2
blue = 255 * 256 * 256
for expected_left_edge in drange(x_data_left, x_data_right - 1, col_width):
for y_plot in drange(expected_top_edge, expected_bottom_edge, 2):
self.debug_pix[expected_left_edge,y_plot] = blue
#self.debug_pix[x + hole_width/2,yline] = 255 * 256 * 256
self.debug_pix[expected_left_edge + hole_width,y_plot] = blue
for x_plot in drange(expected_left_edge, expected_left_edge + hole_width):
self.debug_pix[x_plot, expected_top_edge] = blue
self.debug_pix[x_plot, expected_bottom_edge] = blue
y += col_height
if self.debug:
self.debug_image.show()
# prevent run-a-way debug shows causing my desktop to run out of memory
raw_input("Press Enter to continue...")
self.decoded = []
# Could fold this loop into the previous one - but would it be faster?
for col in xrange(0, CARD_COLUMNS):
col_pattern = []
col_surface = []
for row in xrange(CARD_ROWS):
key = (col, row)
# avergage for 1/3 of a column is greater than the threshold
col_pattern.append('O' if key in data else ' ')
col_surface.append(data[key] if key in data else 0)
tval = tuple(col_pattern)
global translate
self.text += translate[tval] if tval in translate else '@'
self.decoded.append(tval)
self.surface.append(col_surface)
return self
# ASCII art image of card
def dump(self, id, raw_data=False):
print ' Card Dump of Image file:', id, 'Format', 'Raw' if raw_data else 'Dump', 'threshold=', self.threshold
print ' ' + '123456789-' * (CARD_COLUMNS/10)
print ' ' + '_' * CARD_COLUMNS + ' '
print '/' + self.text + '_' * (CARD_COLUMNS - len(self.text)) + '|'
for rnum in xrange(len(self.decoded[0])):
sys.stdout.write('|')
if raw_data:
for val in self.surface:
sys.stdout.write(("(%d)" % val[rnum]) if val[rnum] != 0 else '.' )
else:
for col in self.decoded:
sys.stdout.write(col[rnum] if col[rnum] == 'O' else '.')
print '|'
print '`' + '-' * CARD_COLUMNS + "'"
print ' ' + '123456789-' * (CARD_COLUMNS/10)
print ''
if __name__ == '__main__':
usage = """usage: %prog [options] image [image...]
decode punch card image into ASCII."""
parser = OptionParser(usage)
parser.add_option('-b', '--bright-threshold', type='int', dest='bright', default=-1, help='Brightness (R+G+B)/3, e.g. 127.')
parser.add_option('-s', '--side-margin-ratio', type='float', dest='side_margin_ratio', default=CARD_SIDE_MARGIN_RATIO, help='Manually set side margin ratio (sideMargin/cardWidth).')
parser.add_option('-d', '--dump', action='store_true', dest='dump', help='Output an ASCII-art version of the card.')
parser.add_option('-i', '--display-image', action='store_true', dest='display', help='Display an anotated version of the image.')
parser.add_option('-r', '--dump-raw', action='store_true', dest='dumpraw', help='Output ASCII-art with raw row/column accumulator values.')
parser.add_option('-x', '--x-start', type='int', dest='xstart', default=0, help='Start looking for a card edge at y position (pixels)')
parser.add_option('-X', '--x-stop', type='int', dest='xstop', default=0, help='Stop looking for a card edge at y position')
parser.add_option('-y', '--y-start', type='int', dest='ystart', default=0, help='Start looking for a card edge at y position')
parser.add_option('-Y', '--y-stop', type='int', dest='ystop', default=0, help='Stop looking for a card edge at y position')
parser.add_option('-a', '--adjust-x', type='int', dest='xadjust', default=0, help='Adjust middle edge detect location (pixels)')
(options, args) = parser.parse_args()
for arg in args:
image = Image.open(arg)
card = PunchCard(image, bright=options.bright, debug=options.display, xstart=options.xstart, xstop=options.xstop, ystart=options.ystart, ystop=options.ystop, xadjust=options.xadjust)
print card.text
if (options.dump):
card.dump(arg)
if (options.dumpraw):
card.dump(arg, raw_data=True)
</pre>digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com18tag:blogger.com,1999:blog-7008265127831057484.post-85337017191057979742012-07-25T03:37:00.000-07:002012-07-26T01:48:36.740-07:00Punch Card Reader - The Movie<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Last year I bought an Arduino micro controller and spent some time building the experiments that came with the kit. Having rediscovered some old punch cards, I wondered if I could combine the Arduino, the CHDK firmware for Canon cameras, and my Linux desktop, and read in these old card decks. This is the result:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/LcwxW2ne-UU?feature=player_embedded' frameborder='0'></iframe><br />
<br />
<br />
<br />digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com4tag:blogger.com,1999:blog-7008265127831057484.post-51384661834088800792011-12-16T17:31:00.000-08:002013-03-17T12:45:39.182-07:00RPM Changelogs for Recent Updates 10x FasterIn my previous post I presented a generalised rpm changelog summary script. I've now tidied up the implementation and added a couple of new options.
<p />
One thing bugging me was that the script exec'ed rpm for each package. Even though UNIX process creation is relatively inexpensive, the programs being exec'ed take time to initialise themselves, they have to open files, read configs, create internal structures, etc. The cumulative initialisation costs can be substantial. For example, the old makewhatis script that used to ship which many Linux distro's exec'ed gawk for every manual page, this took 30 minutes on a 486DX66. It was so annoying I rewrote it to exec gawk less often, and the the run time dropped to 1.5 minutes. The improved version is still included <a href="http://freecode.com/projects/man">man-1.6g</a>. Given how many machines were once running this script, the reduction in Carbon emissions may have been significant ;-)<p />
By taking advantage of rpm's --queryformat option I've changed the rpmChangelogs script to exec rpm for 100 rpm arguments at a time. This is about 10 times faster for large runs. For example, when I generated a summary dating back to my upgrade from OpenSUSE 11.4 to 12.1, the run time reduced from about 50 seconds down to 5 seconds.
<p />
I've added an option to include the description of the package. And I've added and option to accept the rpm names from the command line instead of just doing the most recently installed ones.
<p />
Here is the syntax summary for the new version:
<p />
<pre class="brush: plain; light:true; highlight:1">python rpmChangelogs.py -h
Usage: rpmChangelogs.py [options] [rpm...]
Report change log entries for recently installed (-i) rpm's or for the rpm's
specified on the command line.
Options:
-h, --help show this help message and exit
-i INSTALLDAYS, --installed-since=INSTALLDAYS
Include anything installed up to INSTALLDAYS days ago.
-c CHANGEDAYS, --changed-since=CHANGEDAYS
Report change log entries from up to CHANGEDAYS days
ago.
-d, --description Include each rpm's description in the output.
</pre>
<br />
Except for the optional addition of the description, the output is the same as the <a href="http://codeincluded.blogspot.com/2011/12/opensuse-changelogs-for-recent-updates.html">previous</a> OpenSUSE only script.
<br />
My python is a little rusty - I just spent months doing Java - so I've also gone back over it and tried to tidy up the code.
<br />
<h3>
The code</h3>
<br />
(Once you've expanded the code, hover over the code area to bring up options that make it easier to copy or print - requires javascript to be enabled.)
<pre class="brush: python; collapse:false; wrap-lines:false; ">#!/usr/bin/env python
#
# rpmChangelogs.py
#
# Copyright (C) 2011: Michael Hamilton
# The code is GPL 3.0(GNU General Public License) ( http://www.gnu.org/copyleft/gpl.html )
#
# Updated 2013/03/18: now uses seconds from 1970 to avoid localisation issues with the dates output by rpm.
#
import subprocess
from datetime import datetime, timedelta
from optparse import OptionParser
maxArgsPerCommand=100
optParser = OptionParser(
usage='Usage: %prog [options] [rpm...] ',
description="Report change log entries for recently installed (-i) rpm's or for the rpm's specified on the command line.")
optParser.add_option('-i', '--installed-since', dest='INSTALLDAYS', type='int', default=1, help='Include anything installed up to INSTALLDAYS days ago.')
optParser.add_option('-c', '--changed-since', dest='CHANGEDAYS', type='int', default=60, help='Report change log entries from up to CHANGEDAYS days ago.')
optParser.add_option('-d', '--description', dest='DESC', action='store_true', default=False, help="Include each rpm's description in the output.")
(options, args) = optParser.parse_args()
installedSince = datetime.now() - timedelta(days=options.INSTALLDAYS)
changedSince = datetime.now() - timedelta(days=options.CHANGEDAYS)
showDesc = options.DESC
if len(args) > 0:
recentPackages = args
else:
queryProcess = subprocess.Popen(['rpm', '-q', '-a', '--last'], shell=False, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
recentPackages = []
for queryLine in queryProcess.stdout:
(name, dateStr) = queryLine.split(' ', 1)
installDatetime = datetime.strptime(dateStr.strip(), '%a %d %b %Y %H:%M:%S %Z')
if installDatetime < installedSince:
break
recentPackages.append(name)
queryProcess.stdout.close()
queryProcess.wait()
if queryProcess.returncode != 0:
print '*** ERROR (return code was ', queryProcess.returncode, ')'
for line in queryProcess.stderr:
print line,
# Use one rpm exec to query multiple packages - 10x faster than an exec for each one
marker = '+Package: '
markerLen = len(marker)
for subset in [recentPackages[i:i+maxArgsPerCommand] for i in range(0, len(recentPackages), maxArgsPerCommand)]:
format = marker + '%{INSTALLTIME} %{NAME}-%{VERSION}-%{RELEASE}\n' + ('%{DESCRIPTION}\n\n+Changelog:\n' if showDesc else '')
rpmProcess = subprocess.Popen(['rpm', '-q', '--queryformat=' + format, '--changelog'] + subset, shell=False, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
tooOld = False
for line in rpmProcess.stdout:
if line.startswith(marker):
installedDate = datetime.fromtimestamp(float(line[markerLen:line.rfind(' ')]))
name = line.rsplit(' ', 1)[1]
print '=================================================='
print marker, installedDate, name,
print '------------------------------'
tooOld = False
else:
if line.startswith('* ') and len(line) > 17:
try:
changeDate = datetime.strptime(line[:line.rfind(' ')], '* %a %b %d %Y')
tooOld = changeDate < changedSince
except ValueError:
pass # not a date - move on
if not tooOld:
print line,
rpmProcess.stdout.close()
rpmProcess.wait()
if rpmProcess.returncode != 0:
print '*** ERROR (return code was ', rpmProcess.returncode, ')'
for line in rpmProcess.stderr:
print line,
rpmProcess.stderr.close()
</pre>digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com3tag:blogger.com,1999:blog-7008265127831057484.post-37195312471310807302011-12-15T11:50:00.000-08:002011-12-21T00:58:08.291-08:00RPM Changelogs for Recent Updates<blockquote>
<span style="color: #00f;">
Note, in a more recent <a href="http://codeincluded.blogspot.com/2011/12/rpm-changelogs-for-recent-updates-10x.html">post</a> I've sped up this code ten times.
</span>
</blockquote>
In my previous post I showed you a script that could report recent changelogs for OpenSUSE packages. Overnight I realised I could generalise this to all RPM based distros.
Here is a new generalised version:
<br />
<pre class="brush: plain; light:true; highlight:1">% python rpmChangeLogs.py -h
Usage: rpmChangeLogs.py [options]
Report change log entries for recent rpm installs.
Options:
-h, --help show this help message and exit
-i INSTALLDAYS, --installed-since=INSTALLDAYS
Include anything installed up to INSTALLDAYS days ago.
-c CHANGEDAYS, --changedSince=CHANGEDAYS
Report change log entries from up to CHANGEDAYS days
ago.
</pre>
<br />
The output is the same as the <a href="http://codeincluded.blogspot.com/2011/12/opensuse-changelogs-for-recent-updates.html">previous</a> OpenSUSE only script.
<p>
I've also cleaned up the code around python sub-processes.
<h3>The code</h3>
<br />
<pre class="brush: python; collapse:true; wrap-lines:false; ">#!/usr/bin/env python
#
# rpmChangelogs.py
#
# Copyright (C) 2011: Michael Hamilton
# The code is GPL 3.0(GNU General Public License) ( http://www.gnu.org/copyleft/gpl.html )
#
import subprocess
from datetime import date, datetime, timedelta
from optparse import OptionParser
optParser = OptionParser(description='Report change log entries for recent rpm installs.')
optParser.add_option('-i', '--installed-since', dest='INSTALLDAYS', type='int', default=1, help='Include anything installed up to INSTALLDAYS days ago.')
optParser.add_option('-c', '--changedSince', dest='CHANGEDAYS', type='int', default=60, help='Report change log entries from up to CHANGEDAYS days ago.')
(options, args) = optParser.parse_args()
installedSince = datetime.now() - timedelta(days=options.INSTALLDAYS)
changedSince = datetime.now() - timedelta(days=options.CHANGEDAYS)
queryProcess = subprocess.Popen(['rpm', '-q', '-a', '--last'], shell=False, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
for queryLine in queryProcess.stdout:
historyRec = str.split(queryLine, ' ', 1)
installDatetime = datetime.strptime(str.strip(historyRec[1])[4:24], '%d %b %Y %H:%M:%S')
if installDatetime < installedSince:
break
packageName = historyRec[0]
print '=================================================='
print '+Package: ', installDatetime, packageName
print '------------------------------'
rpmProcess = subprocess.Popen(['rpm', '-q', '--changelog', packageName], shell=False, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
for line in rpmProcess.stdout:
try:
if line[0] == '*' and line[1] == ' ' and len(line) > 17:
changeDate = datetime.strptime(line[6:17], '%b %d %Y')
if changeDate < changedSince:
break
except ValueError:
pass # not a date - move on
print line,
rpmProcess.stdout.close()
rpmProcess.wait()
if rpmProcess.returncode != 0:
print '*** ERROR (return code was ', rpmProcess.returncode, ')'
for line in rpmProcess.stderr:
print line,
rpmProcess.stderr.close()
queryProcess.stdout.close()
queryProcess.wait()
if queryProcess.returncode != 0:
print '*** ERROR (return code was ', queryProcess.returncode, ')'
for line in queryProcess.stderr:
print line,
</pre>digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com0tag:blogger.com,1999:blog-7008265127831057484.post-50737728440648358772011-12-14T16:55:00.000-08:002011-12-15T11:43:26.078-08:00OpenSUSE Changelogs for Recent UpdatesHere is a short python script that shows recent portions of the changelogs for recently installed packages. This script is indented for extracting a summary of what has changed after updating my OS to latest packages. Usage is as follows:
<br />
<pre class="brush: plain; light:true; highlight:1">% python zyppHist.py -h
Usage: zyppHist.py [options]
Report change log entries for recent installs (zypper/rpm).
Options:
-h, --help show this help message and exit
-i INSTALLDAYS, --installed-since=INSTALLDAYS
Include anything installed up to INSTALLDAYS days ago.
-c CHANGEDAYS, --changedSince=CHANGEDAYS
Report change log entries from up to CHANGEDAYS days
ago.
</pre>
<br />
Sample output:
<br />
<pre class="brush: plain; light:true; highlight:1">python zyppHist.py -i 1 -c 30
==================================================
+Package: 2011-12-14 21:11:12 glibc
------------------------------
* Wed Nov 30 2011 aj@suse.de
- Do not install INSTALL file.
* Wed Nov 30 2011 rcoe@wi.rr.com
- fix printf with many args and printf arg specifiers (bnc#733140)
* Fri Nov 25 2011 aj@suse.de
- Updated glibc-ports-2.14.1.tar.bz2 from ftp.gnu.org.
* Fri Nov 25 2011 aj@suse.com
- Create glibc-devel-static baselibs (bnc#732349).
* Fri Nov 18 2011 aj@suse.de
- Remove duplicated locales from glibc-2.3.locales.diff.bz2
==================================================
+Package: 2011-12-14 21:11:21 splashy
------------------------------
* Thu Dec 08 2011 hmacht@suse.de
- update artwork for openSUSE 12.1 (bnc#730050)
==================================================
+Package: 2011-12-14 21:11:23 libqt4
------------------------------
* Wed Nov 23 2011 llunak@suse.com
- do not assert on QPixmap usage in non-GUI threads
if XInitThreads() has been called (bnc#731455)
==================================================
+Package: 2011-12-14 21:11:23 libcolord1
------------------------------
* Wed Dec 07 2011 vuntz@opensuse.org
- Update to version 0.1.15:
+ This release fixes an important security bug: CVE-2011-4349.
+ New Features:
- Add a native driver for the Hughski ColorHug hardware
- Export cd-math as three projects are now using it
+ Bugfixes:
- Documentation fixes and improvements
- Do not crash the daemon if adding the device to the db failed
- Do not match any sensor device with a kernel driver
- Don't be obscure when the user passes a device-id to colormgr
- Fix a memory leak when getting properties from a device
- Fix colormgr device-get-default-profile
...
</pre>
<br />
The script produces a summary by extracting a list of recent installs from /var/log/zypp/history. For each recent install the script issues an rpm changelog query to obtain each package's changelog. Each changelog is written out line by line and the output is truncated when a date is encountered that is more than the specified number of days in the past.
<h3>The code</h3>
<br />
<pre class="brush: python; collapse:true; wrap-lines:false; ">#!/usr/bin/env python
#
# zypphist.py
#
# Copyright (C) 2011: Michael Hamilton
# The code is GPL 3.0(GNU General Public License) ( http://www.gnu.org/copyleft/gpl.html )
#
import csv
import subprocess
from datetime import date, datetime, timedelta
from optparse import OptionParser
zyppHistFilename = '/var/log/zypp/history'
optParser = OptionParser(description='Report change log entries for recent installs (zypper/rpm).')
optParser.add_option('-i', '--installed-since', dest='INSTALLDAYS', type='int', default=1, help='Include anything installed up to INSTALLDAYS days ago.')
optParser.add_option('-c', '--changedSince', dest='CHANGEDAYS', type='int', default=60, help='Report change log entries from up to CHANGEDAYS days ago.')
(options, args) = optParser.parse_args()
installedSince = datetime.now() - timedelta(days=options.INSTALLDAYS)
changedSince = datetime.now() - timedelta(days=options.CHANGEDAYS)
zyppHistReader = csv.reader(open(zyppHistFilename, 'rb'), delimiter='|')
for historyRec in zyppHistReader:
if historyRec[0][0] != '#' and historyRec[1] == 'install':
installDate = datetime.strptime(historyRec[0], '%Y-%m-%d %H:%M:%S')
if installDate >= installedSince:
packageName = historyRec[2]
print '=================================================='
print '+Package: ', installDate, packageName
print '------------------------------'
rpmProcess = subprocess.Popen(['rpm', '-q', '--changelog', packageName], shell=False, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=True)
rpmProcess.wait()
if rpmProcess.returncode != 0:
print '*** ERROR (return code was ', rpmProcess.returncode, ')'
for line in rpmProcess.stderr:
print line,
for line in rpmProcess.stdout:
try:
if line[0] == '*' and line[1] == ' ' and len(line) > 17:
changeDate = datetime.strptime(line[6:17], '%b %d %Y')
if changeDate < changedSince:
break
except ValueError:
pass # not a date - move on
print line,
rpmProcess.stdout.close()
rpmProcess.stderr.close()
</pre>digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com0tag:blogger.com,1999:blog-7008265127831057484.post-49181143175287687152011-07-25T02:04:00.000-07:002012-02-21T11:53:09.741-08:00Collectfs - a trash collecting userspace file system for Linux<b>Collectfs</b> is a FUSE userspace filesystem that provides add-on trash collection for a directory hierarchy. The purpose of collectfs is to protect a project hierarchy by providing a fairly universal no-clobber mechanism:<br />
<ul><li>The history of changes is preserved.<br />
</li>
<li> Missteps in using rm, mv, cat, etc are non-permanent.<br />
</li>
<li>It works seamlessly with standard development tools.<br />
</li>
</ul>It is not intended as a replacement for revision control or backups. The intention is to protect you during the between-times, when you're not covered by these other tools.<br />
<br />
Any file that is overwritten by remove (unlink), move, link, symlink, or open-truncate is relocated to a trash directory (mount-point/.trash/). Removed files are date-time stamped so that edit history is maintained (a version number is appended if the same file is collected more than once in the same second).<br />
<br />
Usage is quite straight forward, for example:<br />
<pre class="brush:plain;highlight:1;light:true"># Use collectfs to mount the real folder onto a mount point (any other folder)
% collectfs myProject myWorkspace
# Now the mount point mirrors the original but with trash collection
% cd myWorkspace
% vi main.c
% indent main.c
% ls .trash
main.c.2011-07-24.14:48:20
main.c.2011-07-24.14:59:39
% diff .trash/main.c.2011-07-24.14:59:39 main.c
...
% mv .trash/main.c.2011-07-24.14:59:39 main.c
% ls .trash
main.c.2011-07-24.14:48:20
main.c.2011-07-24.15:00:37
# To unmount (stop using) the virtual filesystem...
% cd ..
% fusermount -u myWorkspace
</pre>It's easy to build it from source. The source is available at: <a href="http://code.google.com/p/collectfs/">http://code.google.com/p/collectfs/</a><br />
Thanks to Thomas Spahni (vodoo on the http://forums.opensuse.org/) the openSUSE build services now has a <a href="https://build.opensuse.org/package/show?package=collectfs&project=home%3Avodoo">collectfs package.</a> (I also used Thomas's much more concise description of collectfs to rewrite the first paragraph of this blog post)<br />
<br />
I'm currently using collectfs to help write collectfs. It augments my development environment with a local-history feature similar to that provided by eclipse. But unlike eclipse, KDE or gnome, the protection is implemented in the filesystem layer and applies to any tool I care to employ. <br />
<br />
Thoughts on a deeper level... I don't think eclipse, KDE, or gnome should be in the business of collecting trash and providing undelete. The proper place to do this is in the filesystem. But this would be a very radical departure for a UNIX filesystem, probably too radical to feature in main-stream UNIX filesystem development. Such features do not fit well with established practice, and raise concerns about security and privacy (data coming back to haunt you). I do think that if you want to think beyond Linux/KDE/gnome, concepts such as trash, undelete, file-versioning are all things that would be worth consideration for a true desktop OS. I believe GNU Hurd was originally to have file-versioning similar to VMS - I considered using fuse to implement VMS-style file-versioning: <br />
<code><br />
xyz.txt;1 xyz.txt;2 xyz.txt;3 ... xyz.txt;N <br />
open(xyz.txt) == open(xyz.txt;N) <br />
</code><br />
but I never liked the clutter in my VMS directories, so I went for timestamped trash instead.digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com0tag:blogger.com,1999:blog-7008265127831057484.post-12592714564123919522011-06-17T23:20:00.000-07:002011-06-18T16:24:02.340-07:00Linux as seen by matplotlib - the view from 2000 metresIn my <a href="http://codeincluded.blogspot.com/2011/06/linux-process-info-to-csv-via-python.html">previous post</a> I shared some python code that extracts Linux process-file-system data into python objects. In this post I will augment the code from the previous post with some new code that generates rolling plots of process file system data. <br />
<br />
I've been playing around with generating detailed overviews of my running Linux system. Existing tools, such as top or htop, do a pretty good job of displaying the processes consuming the highest resources, but I'm looking for an <i>at-a-glance</i> overview of all processes and their impact on the system. I've used this goal as a reason to learn a little more about the python plotting tool <a href="http://matplotlib.sourceforge.net/"><i>matplotlib</i></a>.<br />
<div><br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-FIfViaG4ZvI/Tfvkdhx7KUI/AAAAAAAAArY/ALMQOtP-Ovk/s1600/bubble1.jpeg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="272" src="http://2.bp.blogspot.com/-FIfViaG4ZvI/Tfvkdhx7KUI/AAAAAAAAArY/ALMQOtP-Ovk/s320/bubble1.jpeg" width="320" /></a></div>The plot to the right is generated by a matplotlib python script producing something similar to the LibreOffice plot in my <a href="http://codeincluded.blogspot.com/2011/06/linux-process-info-to-csv-via-python.html">previous blog entry</a>. The process-file-system extraction code from the previous blog posting is used to extract CPU, vsize, and RSS data. Unlike the previous example, the script refreshes the plot every few seconds. This new plot uses a hash to pick a colour for each user and also annotates each point with the PID, command, and username.<br />
<br />
Plotting every process results in a very cluttered graph To reduce the clutter I have restricted to plot data to the "interesting" processes - those that have changing values. Interest decays with inactivity, so processes that become static eventually disappear from the graph. <br />
<br />
I wasn't happy with the bubble plot because clutter made it hard to get an overview of all process activity, which was the whole point of the exercise. Time for a different tack.<br />
<br />
</div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-8sdXhDm69qU/TfwMph9vKfI/AAAAAAAAArg/MiBrGwXFM5k/s1600/grid1a.jpeg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="271" src="http://4.bp.blogspot.com/-8sdXhDm69qU/TfwMph9vKfI/AAAAAAAAArg/MiBrGwXFM5k/s320/grid1a.jpeg" width="320" /></a></div>The plot in the screen-capture on the right is an attempt at an at-a-glance view of every process and thread This new script periodically extracts stat, status, and IO data from the Linux process file-system. The script calculates changes in the data between refresh cycles and plots indications of the changes. There is a point for each process and thread. Points are grouped by user and ordered by process start time. The size of each point indicates relative resident set size. The colours of each point represent the process's state during the cycle:<br />
<br />
<ol><li>Light grey: no activity</li>
<li>Red: consumed CPU</li>
<li>Green: perfomed IO</li>
<li>Orange: Consumed CPU and performed IO</li>
<li>Yellow: exited.</li>
</ol><br />
Changes in RSS are exaggerated so that even small changes in RSS are visible as a little "throb" in the point size. The plot updates once every three seconds or when ever the space bar is pressed. By grouping by username and start time, the placement of points remains fairly static and predictable from cycle to cycle.<br />
<div><br />
<div class="separator" style="clear: both; text-align: center;"></div>In matplotlib it is relatively easy to trap hover and click events. I've included code to trap hover and display a popup details for the process under the mouse. In the screen-capture above, I've hovered over the Xorg X-Windows server. I also trap right-click and pin the popup so that will remain updating even after the mouse is moved away.<br />
<br />
The the number of memory hogs visible on the plot is a bit exaggerated because the plot includes threads as well as processes (threads share memory). This plot is somewhat similar to the <a href="http://www.scipy.org/Cookbook/Matplotlib/HintonDiagrams">Hinton Diagram</a> seen in the matplotlib example pages, which may also be an interesting basis for viewing a collection of processes or servers. (My own efforts pre-dates my discovering the Hinton example, so the implementation details differ substantially.)<br />
<div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-XikOEYtKss8/Tfw62A-4cBI/AAAAAAAAArs/d7m4T9QU244/s1600/timeline4a.jpeg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="271" src="http://4.bp.blogspot.com/-XikOEYtKss8/Tfw62A-4cBI/AAAAAAAAArs/d7m4T9QU244/s320/timeline4a.jpeg" width="320" /></a></div><br />
<br />
Finally, I thought I'd include a 3D plot. I'm not sure it's that useful, but adding an extra dimension is food for thought. The script plots cumulative-CPU and RSS over the duration of the scripts run time. It uses a heuristic to select "interesting" processes. In this particular screen capture I shut down my eclipse+android IDE - first set of terminated lines - and then restarted the same IDE (steep climbing lines). The CPU and memory consumed by the plotting script can be seen as the diagonal line across the bottom time-CPU axis.<br />
<br />
<br />
Matplotlib is a really useful plotting library, when ever I thought I'd exhausted what was possible, a Google-search would usually prove me wrong. </div><br />
<h3>The code</h3><br />
<h4>plot_bubble_cpu_vsize_rss.py</h4><br />
<pre class="brush:python; collapse:true; wrap-lines:false;">#!/usr/bin/env python
#
# Bubble Plot: cpu (x), vsize(y), rss(z)
#
# Copyright (C) 2011: Michael Hamilton
# The code is GPL 3.0(GNU General Public License) ( http://www.gnu.org/copyleft/gpl.html )
#
import matplotlib
matplotlib.use('GTkAgg')
import pylab
import gobject
import sys
import os
import math
from optparse import OptionParser
import linuxprocfs as procfs
SC_CLK_TCK=float(os.sysconf(os.sysconf_names['SC_CLK_TCK']))
ZOMBIE_LIVES=2 # Not a true UNIX zombie, just a plot zombie.
INTERESTING_DEFAULT=3 # Don't display a process if it is boring this many times in a row.
INTERVAL = 2
_user_colors = {}
def _user_color(username):
if username in _user_colors:
color = _user_colors[username]
else:
# For a username, generate a psuedo random (r,g,b) values in the 0..1.0 range
# E.g. for root base this on hash(root), hash(oot), and hash(ot) - randomise
# each hash further - seems to work better than the straight hash
color = tuple([math.fabs(hash(username[i:]) * 16807 % 2147483647)/(2147483647 * 1.3) for i in (0, 2, 1)])
#color = tuple([abs(hash(username[i:]))/(sys.maxint*1.3) for i in (2, 1, 0)])
_user_colors[username] = color
print username, color
return color
class TrackingData(object):
def __init__(self, info):
self.previous = info
self.current = info
self.color = _user_color(info.username)
self.text = '%d %s %s' % (info.pid, info.username, info.stat.comm )
self.interesting = 0
self.x = 0
self.y = info.stat.vsize
self.z = info.stat.rss / 10
self.zombie_lives = ZOMBIE_LIVES
def update(self, info):
self.current = info
oldx = self.x
oldy = self.y
oldz = self.z
self.x = ((info.stat.stime + info.stat.utime) - (self.previous.stat.stime + self.previous.stat.utime)) /SC_CLK_TCK
self.y = info.stat.vsize
self.z = info.stat.rss / 10
if self.x == oldx and self.y == oldy and self.z == oldz:
if self.interesting > 0:
self.interesting -= 1 # if interesting drops to zero, stop plotting this process
else:
self.interesting = INTERESTING_DEFAULT
self.previous = self.current
self.zombie_lives = ZOMBIE_LIVES
def is_alive(self):
self.zombie_lives -= 1
return self.zombie_lives > 0
class PlotBubbles(object):
def __init__(self, sleep=INTERVAL, include_threads=False):
self.subplot = None
self.datums = {}
self.sleep = sleep * 1000
self.include_threads = include_threads
def make_graph(self):
all_procs = procfs.get_all_proc_data(self.include_threads)
y_list = []
x_list = []
s_list = []
color_list = []
anotations = []
for proc_info in all_procs:
if not proc_info.pid in self.datums:
data = TrackingData(proc_info)
self.datums[proc_info.pid] = data
else:
data = self.datums[proc_info.pid]
data.update(proc_info)
if data.interesting > 0: # Only plot active processes
x_list.append(data.x)
y_list.append(data.y)
s_list.append(data.z)
color_list.append(data.color)
anotations.append((data.x, data.y, data.color, data.text))
if not data.is_alive():
del self.datums[proc_info.pid]
if self.subplot == None:
figure = pylab.figure()
self.subplot = figure.add_subplot(111)
else:
self.subplot.cla()
if len(x_list) == 0: # Nothing to plot - probably initial cycle
return True
self.subplot.scatter(x_list, y_list, s=s_list, c=color_list, marker='o', alpha=0.5)
pylab.xlabel(r'Change in CPU (stime+utime)', fontsize=20)
pylab.ylabel(r'Total vsize', fontsize=20)
pylab.title(r'Process CPU, vsize, and RSS')
pylab.grid(True)
gca = pylab.gca()
for x, y, color, text in anotations:
gca.text(x, y, text, alpha=1, ha='left',va='bottom',fontsize=8, rotation=33, color=color)
pylab.draw()
return True
def start(self):
self.scatter = None
self.make_graph()
pylab.connect('key_press_event', self)
gobject.timeout_add(self.sleep, self.make_graph)
pylab.show()
def __call__(self, event):
self.make_graph()
if __name__ == '__main__':
usage = """usage: %prog [options|-h]
Plot change in CPU(x), total vsize(y) and total RSS.
"""
parser = OptionParser(usage)
parser.add_option("-t", "--threads", action="store_true", dest="include_threads", help="Include threads as well as processes.")
parser.add_option("-s", "--sleep", type="int", dest="interval", default=INTERVAL, help="Sleep seconds for each repetition.")
(options, args) = parser.parse_args()
PlotBubbles(options.interval, options.include_threads).start()
</pre><br />
<pre class="brush:plain;highlight:1;light:true">% python plot_bubble_cpu_vsize_rss.py -h
Usage: plot_bubble_cpu_vsize_rss.py [options|-h]
Plot change in CPU(x), total vsize(y) and total RSS.
Options:
-h, --help show this help message and exit
-t, --threads Include threads as well as processes.
-s INTERVAL, --sleep=INTERVAL
Sleep seconds for each repetition.
</pre><br />
<h4>plot_grid.py</h4><br />
<pre class="brush: python; collapse:true; wrap-lines:false;">#!/usr/bin/env python
#
# Copyright (C) 2011: Michael Hamilton
# The code is GPL 3.0(GNU General Public License) ( http://www.gnu.org/copyleft/gpl.html )
#
import matplotlib
matplotlib.use('GTkAgg')
import pylab
import time
import gobject
import math
import os
import linuxprocfs as procfs
from optparse import OptionParser
SC_CLK_TCK=float(os.sysconf(os.sysconf_names['SC_CLK_TCK']))
LIMIT=1000
INTERVAL=3
DEFAULT_COLS=30
BASE_POINT_SIZE=300
MIN_SIZE=20
ZOMBIE_LIVES=3 # Not a true UNIX zombie, just a plot zombie
MAX_RSS = 10
# Don't spell colour two ways - conform with pylab
DEFAULT_COLORS=['honeydew','red','lawngreen','orange','yellow','lightyellow','white']
class ProcessInfo(object):
def __init__(self, new_procdata):
self.data = new_procdata
self.previous = None
self.alive = True
self.info = '%d %s %s' % (new_procdata.pid, new_procdata.username, new_procdata.stat.comm )
self.username = new_procdata.username
self.color = 'blue'
self.zombie_lives = ZOMBIE_LIVES
def update(self, new_procdata):
self.previous = self.data
self.data = new_procdata
self.zombie_lives = ZOMBIE_LIVES
def report(self):
return '%s\nstate=%s\nutime=%f\nstime=%f\nrss=%d\nreads=%d\nwrites=%d' % \
(self.info,
self.data.status.state if not self.is_zombie() else 'exited',
self.data.stat.utime/SC_CLK_TCK,
self.data.stat.stime/SC_CLK_TCK,
self.data.stat.rss,
self.data.io.read_bytes,
self.data.io.write_bytes)
def is_alive(self):
self.zombie_lives -= 1
return self.zombie_lives > 0
def is_zombie(self):
return self.zombie_lives < ZOMBIE_LIVES - 1
def sort_key(self):
return (self.username, self.data.stat.starttime, self.data.pid)
class Activity_Diagram(object):
def __init__(self, sleep=INTERVAL, max_cols=DEFAULT_COLS, point_size=BASE_POINT_SIZE, include_threads=True, colors=''):
self.process_info = {}
self.pos_index = {}
self.subplot = self.label = self.hover_tip = self.hover_tip_data = None
self.hover_tip_sticky = False
self.include_threads = include_threads
self.sleep = sleep
self.max_cols = max_cols
self.point_size = point_size
# Extend colors to same length as default, merge together colors and default, choose non blank values
self.normal_color, self.cpu_color, self.io_color, self.cpuio_color, self.exit_color, self.tip_color, self.bg_color = \
[ (c if c != '' and c != None else d) for c,d in map(None, colors, DEFAULT_COLORS)]
def start(self):
self.start_time = time.time()
self._create_new_plot()
pylab.connect('motion_notify_event', self)
pylab.connect('button_press_event', self)
pylab.connect('key_press_event', self)
gobject.timeout_add(self.sleep * 1000, self._create_new_plot)
pylab.show()
def _create_new_plot(self):
if self.subplot == None:
figure = pylab.figure()
figure.set_frameon(True)
self.subplot = figure.add_subplot(111, axis_bgcolor=self.bg_color)
else:
self.subplot.cla()
plot = self.subplot
x_vals, y_vals, c_vals, s_vals, namelabels = self._retrieve_data()
plot.scatter(x_vals, y_vals, c=c_vals, s=s_vals) #, label=data.info)
plot.axis('equal')
plot.set_xticks([])
plot.set_yticks(namelabels[0])
plot.set_yticklabels(namelabels[1], stretch='condensed')
self._create_tip(plot) # Refresh the tip the user is currently looking at
pylab.title('Activity: size=RSS changes; red=CPU; green=IO; orange=CPU and IO.')
pylab.draw()
#print "ping"
return True
def _retrieve_data(self):
# Update from procfs
for proc in procfs.get_all_proc_data(include_threads=self.include_threads):
if not proc.pid in self.process_info:
data = ProcessInfo(proc)
self.process_info[proc.pid] = data
else:
data = self.process_info[proc.pid]
data.update(proc)
max_rss = 0
for info in self.process_info.values():
if max_rss < info.data.stat.rss:
max_rss = info.data.stat.rss
# Compute values for plotting
x_vals = []; y_vals = []; c_vals = []; s_vals = []
usernames = [[],[]]
col = 0; row = 0
self.pos_index = {}
previous = None
for info in sorted(self.process_info.values(), key=lambda info: info.sort_key()):
if not info.is_alive():
del self.process_info[info.data.pid]
else:
if (not previous or previous.username != info.username):
if col != 0:
col = 0; row += 1
print info.username, -row
usernames[0].append(-row)
usernames[1].append(info.username)
self.pos_index[(col,-row)] = info
x_vals.append(col)
y_vals.append(-row) # Invert ordering
c_vals.append(self._decide_color(info))
s_vals.append(self._decide_size(info, max_rss))
col += 1
if col == self.max_cols:
col = 0; row += 1
previous = info
return (x_vals, y_vals, c_vals, s_vals, usernames)
def _decide_color(self,info):
if info.is_zombie():
return self.exit_color
if info.previous == None:
delta_cpu = delta_io = 0
else:
delta_cpu = (info.data.stat.utime + info.data.stat.stime) - (info.previous.stat.utime + info.previous.stat.stime)
delta_io = (info.data.io.read_bytes + info.data.io.write_bytes) - (info.previous.io.read_bytes + info.previous.io.write_bytes)
color = self.normal_color
if delta_io > 0:
color = self.io_color
if delta_cpu > 0 or info.data.stat.state == 'R':
if delta_io > 0:
color = self.cpuio_color
else:
color = self.cpu_color
return color
def _decide_size(self, info, max_rss):
rss = info.data.stat.rss
delta_rss = rss - (info.previous.stat.rss if info.previous else rss)
# A relative proportion of the base dot size
size = max(MIN_SIZE, self.point_size * rss / max_rss)
if delta_rss > 0: # Temporary throb to indicate change
size += max(20,size/4)
elif delta_rss < 0:
size -= max(20,size/4)
return size
def _create_tip(self, axes, x=None, y=None, data=None, toggle_sticky=False):
if data:
if not self.hover_tip_sticky or toggle_sticky:
if self.hover_tip: self.hover_tip.set_visible(False)
self.hover_tip = axes.text(x, y, data.report(), bbox=dict(facecolor=self.tip_color, alpha=0.85), zorder=999)
self.hover_tip_data = (x,y,data)
if toggle_sticky: self.hover_tip_sticky = not self.hover_tip_sticky
elif self.hover_tip_data:
x,y,data = self.hover_tip_data
self.hover_tip = axes.text(x, y, data.report(), bbox=dict(facecolor=self.tip_color, alpha=0.85), zorder=999)
def _clear_tip(self):
if self.hover_tip and not self.hover_tip_sticky:
self.hover_tip.set_visible(False) # will be free'ed up by next plot draw
self.hover_tip = self.hover_tip_data = None
def __call__(self, event):
#print event.name
if event.name == 'key_press_event':
self._create_new_plot()
else:
if (event.name == 'motion_notify_event' or event.name == 'button_press_event') and event.inaxes:
point = (int(event.xdata + 0.5), int(event.ydata - 0.5))
if point in self.pos_index: # On button click let tip stay open without hover
self._create_tip(event.inaxes, event.xdata, event.ydata, self.pos_index[point], event.name == 'button_press_event')
else:
self._clear_tip()
pylab.draw()
if __name__ == '__main__':
usage = """usage: %prog [options|-h]
Plot RSS, CPU and IO, with hover and click for details.
RSS size is plotted as a circle - the circle will temporarily
jump up and down in size to indicate a growing or shrinking
RSS - the steady state size is a relative size indicator.
"""
parser = OptionParser(usage)
parser.add_option("-p", "--no-threads", action="store_true", dest="no_threads", help="Exclude threads, only show processes.")
parser.add_option("-s", "--sleep", type="int", dest="interval", default=INTERVAL, help="Sleep seconds for each repetition.")
parser.add_option("-n", "--columns", type="int", dest="columns", default=DEFAULT_COLS, help="Number of columns in each row (maximum).")
parser.add_option("-d", "--point-size", type="int", dest="point_size", default=BASE_POINT_SIZE, help="Dot point size (expressed as square area).")
parser.add_option("-c", "--colors", type="string", dest="colors", default='',
help="Colors for normal,cpu,io,cpuio,exited,tip,bg comma separated. " +
"Only supply the ones you want to change e.g. -cwhite,,blue - " +
" Defaults are " + ','.join(DEFAULT_COLORS))
(options, args) = parser.parse_args()
Activity_Diagram(options.interval, options.columns, options.point_size, not options.no_threads, options.colors.split(',')).start()
</pre><br />
<pre class="brush:plain;highlight:1;light:true">% python plot_grid.py -h
Usage: plot_grid.py [options|-h]
Plot RSS, CPU and IO, with hover and click for details.
RSS size is plotted as a circle - the circle will temporarily
jump up and down in size to indicate a growing or shrinking
RSS - the steady state size is a relative size indicator.
Options:
-h, --help show this help message and exit
-p, --no-threads Exclude threads, only show processes.
-s INTERVAL, --sleep=INTERVAL
Sleep seconds for each repetition.
-n COLUMNS, --columns=COLUMNS
Number of columns in each row (maximum).
-d POINT_SIZE, --point-size=POINT_SIZE
Dot point size (expressed as square area).
-c COLORS, --colors=COLORS
Colors for normal,cpu,io,cpuio,exited,tip,bg comma
separated. Only supply the ones you want to change
e.g. -cwhite,,blue - Defaults are
honeydew,red,lawngreen,orange,yellow,lightyellow,white
</pre><br />
<h4>plot_cpu_vsize_time_line_3d.py</h4><br />
<pre class="brush:python; collapse:true; wrap-lines:false;">#!/usr/bin/env python
#
# Plot a limited time line in 3D for CPU and RSS
#
# Copyright (C) 2011: Michael Hamilton
# The code is GPL 3.0(GNU General Public License) ( http://www.gnu.org/copyleft/gpl.html )
#
import matplotlib
matplotlib.use('GTkAgg')
import pylab
import time
import gobject
import gtk
import random
import string
import math
import mpl_toolkits.mplot3d.axes3d as axes3d
from optparse import OptionParser
import linuxprocfs as procfs
HZ=1000.0
LIMIT=1000
INTERVAL=5
ZOMBIE_LIVES=5
MAX_INTERESTING=5.0
class ProcessInfo(object):
def __init__(self, new_procdata):
self.base = new_procdata
self.previous = new_procdata
self.data = new_procdata
self.text = '%d %s %s' % (new_procdata.pid, new_procdata.username, new_procdata.stat.comm )
self.color = '#%6.6x' % (((self.base.pid * 16807 % 2147483647)/(2147483647 * 1.3)) % 2**24 )
self.color = tuple([(self.base.pid * i * 16807 % 2147483647)/(2147483647 * 1.3) for i in (7, 13, 23)])
self.xvals = []
self.yvals = []
self.zvals = []
self.zombie_lives = ZOMBIE_LIVES
self.interesting = 0
self.visible = False
def update(self, new_procdata):
self.previous = self.data
self.data = new_procdata
self.zombie_lives = 5
x = self.data.time_stamp
y = ((self.data.stat.utime - self.base.stat.utime) + (self.data.stat.stime - self.base.stat.stime)) / HZ
z = float(new_procdata.stat.rss)
if len(self.xvals) > LIMIT:
del self.xvals[0]
del self.yvals[0]
del self.zvals[0]
self.xvals.append(x)
self.yvals.append(y)
self.zvals.append(z)
if self.data.stat.utime - self.previous.stat.utime > 10.0 or self.data.stat.stime - self.previous.stat.stime > 10.0 or abs(self.data.stat.rss - self.previous.stat.rss) > 15000:
self.visible = True
self.interesting = MAX_INTERESTING
class Activity_Diagram(object):
def __init__(self, remove_dead=True):
self.process_info = {}
self.start_time = None
self.subplot = None
self.remove_dead = remove_dead
def make_graph(self):
for proc in procfs.get_all_proc_data(include_threads=True):
if not proc.pid in self.process_info:
data = ProcessInfo(proc)
self.process_info[proc.pid] = data
else:
data = self.process_info[proc.pid]
data.update(proc)
if self.subplot == None:
figure = pylab.figure()
self.subplot = figure.add_subplot(111, projection='3d')
else:
self.subplot.cla()
for pid, data in self.process_info.items():
if data.zombie_lives <= 0 and self.remove_dead:
del self.process_info[pid] # dead for a while now - remove
data.zombie_lives -= 1
if data.visible:
if len(data.xvals) > 0:
alpha =data.interesting / MAX_INTERESTING # boring processes fade away
self.subplot.plot(data.xvals, data.yvals, data.zvals, color=data.color, alpha=alpha, linewidth=2)
self.subplot.text(data.xvals[-1], data.yvals[-1], data.zvals[-1], data.text, alpha=alpha, fontsize=6, color = data.color if data.zombie_lives >= ZOMBIE_LIVES - 1 else 'r')
data.interesting -= 1 if data.interesting > 2 else 0 # Losing interest
self.subplot.set_xlabel(r'time', fontsize=10)
self.subplot.set_ylabel(r'cpu', fontsize=10)
self.subplot.set_zlabel(r'rss', fontsize=10)
pylab.title('Cumulative CPU and RSS over %f minutes' % ((time.time() - self.start_time) /60.0))
pylab.draw()
return True
def start(self, sleep_secs=INTERVAL):
self.start_time = time.time()
self.make_graph()
pylab.connect('key_press_event', self)
gobject.timeout_add(INTERVAL*1000, self.make_graph)
pylab.show()
def __call__(self, event):
self.make_graph()
def boo():
print 'boo'
if __name__ == '__main__':
usage = """usage: %prog [options|-h]
Plot cumulative-CPU and RSS over the script run time.
"""
parser = OptionParser(usage)
parser.add_option("-s", "--sleep", type="int", dest="sleep_secs", default=INTERVAL, help="Sleep seconds for each repetition.")
(options, args) = parser.parse_args()
Activity_Diagram(remove_dead=False).start(sleep_secs=options.sleep_secs)
</pre><br />
<pre class="brush:plain;highlight:1;light:true">% python plot_cpu_rss_time_lines_3d.py -h
Usage: plot_cpu_rss_time_lines_3d.py [options|-h]
Plot cumulative-CPU and RSS over the script run time.
Options:
-h, --help show this help message and exit
-s SLEEP_SECS, --sleep=SLEEP_SECS
Sleep seconds for each repetition.
</pre><h3>Notes</h3><br />
Even if you don't know any python, you can still have a play by running these scripts from the command line. Just save them into a folder giving them the appropriate file-names and also save the linuxprocfs.py from my previous post. Make sure you've installed python-matplotlib for you distribution of Linux (it's a standard offering for openSUSE, so it's in the repo).<br />
<br />
Consult the notes from the <a href="http://codeincluded.blogspot.com/2011/06/linux-process-info-to-csv-via-python.html#Notes">previous blog</a> entry for some notes on the Linux process file-system.<br />
<br />
The <a href="http://matplotlib.sourceforge.net/">matplotlib Web-site</a> ( <a href="http://matplotlib.sourceforge.net/">http://matplotlib.sourceforge.net/</a> ) contains plenty of documentation and examples, plus Google will track down heaps more advice and examples.<br />
<br />
This blog page uses <a href="http://alexgorbatchev.com/">SyntaxHighlighter</a> by Alex Gorbatchev. You can easily copy and paste a line-number free version of the code by selecting view-source icon in the mini-tool-bar that appears on the top right of the source listing (if javascript is enabled).<br />
<br />
<br />
<script type="text/javascript">
SyntaxHighlighter.config.bloggerMode = true;
SyntaxHighlighter.all();
</script>digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com0tag:blogger.com,1999:blog-7008265127831057484.post-14517313737106642782011-06-10T22:16:00.000-07:002011-06-18T15:15:02.937-07:00Linux proc stat, status, io to CSV via python<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-E49ZZxRlkfw/TfRdPYjzjWI/AAAAAAAAArU/h9Ta0Eg2ANk/s1600/pyLinuxVisLcalc5a.jpeg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="226" src="http://3.bp.blogspot.com/-E49ZZxRlkfw/TfRdPYjzjWI/AAAAAAAAArU/h9Ta0Eg2ANk/s320/pyLinuxVisLcalc5a.jpeg" width="320" /></a></div><br />
I've written a python script that extracts data from the Linux Process File-system into python objects. I've included an option to dump data to CSV. The CSV can be directly loaded into tools such LibreOffice Calc for analysis and plotting. The screen-capture to the right shows a LibreOffice Calc plot of data for processes running on my desktop. For each process the plot shows total-C<i>PU-time</i> (x-axis), <i>allocated virtual memory size</i> (y-axis), and<i> resident set size</i> (bubble area).<br />
<br />
My python script can be run from the command line and includes a variety of command line options including --help, for example:<br />
<br />
<pre class="brush: plain; light:true; highlight:1">% python linuxprocfs.py -h
Usage: linuxprocfs.py [options] [pid...]
Output CSV for procfs stat, status or io data for given thread/process pid's or
for all processes and threads if no pid's are supplied.
Options:
-h, --help show this help message and exit
-s, --stat Output csv for pid/stat files.
-S, --status Output csv for pid/status files.
-i, --io Output csv for pid/io files.
-t, --titles Output a title line.
-r, --repeat Repeat until interrupted.
-w WAIT, --sleep=WAIT
Sleep seconds for each repetition.
-p, --processes Show all processes, but not threads.
</pre><br />
On my desktop, LibraOffice Calc is starting to struggle when plotting large amounts of data. It might be better to process and the data further and plot it using a dedicated plotting tool - which is what I will describe next time.<br />
<br />
<h3>The code</h3><br />
<pre class="brush: python; collapse:true; wrap-lines:false; ">#!/usr/bin/env python
#
# Copyright (C) 2011: Michael Hamilton
# The code is LGPL (GNU Lesser General Public License) ( http://www.gnu.org/copyleft/lesser.html )
#
from __future__ import with_statement
import re
import os
import glob
import string
import pwd
import csv
import sys
import time
from optparse import OptionParser
PROC_FS_ROOT = '/proc'
INT_RE_SPEC = '[+-]*\d+'
INT_RE = re.compile(INT_RE_SPEC + '$')
CSV_LINE_TERMINATOR='\n'
# Default parser that deals with a multi-line file where each line
# is a "tag: value" pair
class _ProcBase(object):
_split_re = re.compile(':\s+')
def __init__(self, path=None, filename=None):
self.error = None
if path and filename:
self.parseProcFs(path, filename)
def parseProcFs(self, path, filename):
pid = os.path.basename(path)
self.pid = int(pid) if INT_RE.match(pid) else pid
try:
with open(path + '/' + filename) as proc_file:
for line in proc_file.read().splitlines():
sort_key, value = _ProcBase._split_re.split(line)
self.__dict__[string.lower(sort_key)] = int(value) if INT_RE.match(value) else value
except IOError as ioerr:
self.handle_error('IOError %s/%s - %s' % (path, filename, ioerr))
def handle_error(self, message):
self.error = message
print >> sys.stderr, self.error
def keys(self):
return sorted(self.__dict__.keys())
def csv(self, file, header=True):
if not self.error:
if header:
csv.writer(file, lineterminator=CSV_LINE_TERMINATOR).writerow(self.keys())
csv.DictWriter(sys.stdout, self.keys(), lineterminator=CSV_LINE_TERMINATOR).writerow(self.__dict__)
# Parser for space separated Values on one line- e.g. "12 comm 123456 111 a 12"
class _SpaceSeparatedParser(object):
def __init__(self):
self._keys = []
self._re_spec = ''
self._regexp = None
def _add_item(self, sort_key, rexp_str):
self._regexp = None
self._keys.append(sort_key)
if rexp_str:
self._re_spec += rexp_str % sort_key
return self
def int_item(self, sort_key):
return self._add_item(sort_key, '(?P<%s>' + INT_RE_SPEC + ')\s')
def comm_item(self, sort_key):
return self._add_item(sort_key, '[(](?P<%s>[^)]+)[)]\s')
def string_item(self, sort_key):
return self._add_item(sort_key, '(?P<%s>\w+)\s')
def nonparsed_item(self, sort_key): # Create property sort_key only, but don't parse it
return self._add_item(sort_key, None)
def keys(self):
return self._keys;
def parse(self, line):
if not self._regexp:
self._regexp = re.compile(self._re_spec)
return self._regexp.match(line)
class ProcStat(_ProcBase):
_parser = _SpaceSeparatedParser().\
int_item('pid').\
comm_item('comm').\
string_item('state').\
int_item('ppid').\
int_item('pgrp').\
int_item('session').\
int_item('tty_nr').\
int_item('tpgid').\
int_item('flags').\
int_item('minflt').\
int_item('cminflt').\
int_item('majflt').\
int_item('cmajflt').\
int_item('utime').\
int_item('stime').\
int_item('cutime').\
int_item('cstime').\
int_item('priority').\
int_item('nice').\
int_item('num_threads').\
int_item('itrealvalue').\
int_item('starttime').\
int_item('vsize').\
int_item('rss').\
int_item('rlim').\
int_item('startcode').\
int_item('endcode').\
int_item('startstack').\
int_item('kstkesp').\
int_item('kstkeip').\
int_item('signal').\
int_item('blocked').\
int_item('sigignore').\
int_item('sigcatch').\
int_item('wchan').\
int_item('nswap').\
int_item('cnswap').\
int_item('exit_signal').\
int_item('processor').\
int_item('rt_priority').\
int_item('policy').\
int_item('delayacct_blkio_ticks').\
int_item('guest_time').\
int_item('cguest_time').\
nonparsed_item('error')
def __init__(self, path):
_ProcBase.__init__(self)
if path:
self.parseProcFs(path)
def parseProcFs(self, path):
path = path + '/stat'
try:
with open(path) as stat_file:
for line in stat_file: # Only one line in file
if line and line != '':
self.parse(line)
self.error = None
else:
self.error = 'Empty line'
except IOError as ioerr:
self.handle_error('IOError %s - %s' % (path, ioerr))
def parse(self, line):
# Dynamically (at run time) add properties to this instance representing
# each stat value. E.g. add the pid value as a field called self.pid
split_line = ProcStat._parser.parse(line);
if split_line:
# Update the properties of the Stat instance with integer or
# string values as appropriate.
for sort_key, value in split_line.groupdict().items():
self.__dict__[sort_key] = int(value) if INT_RE.match(value) else value
else:
self.handle_error('Failed to match:' + line)
def keys(self):
return ProcStat._parser.keys()
class ProcStatus(_ProcBase):
def __init__(self, path):
_ProcBase.__init__(self, path, 'status')
if not self.error:
self.uid = [ int(uid) for uid in string.split(self.uid,'\t')]
class ProcIO(_ProcBase):
def __init__(self, path):
_ProcBase.__init__(self, path, 'io')
class ProcInfo(object):
def __init__(self, path):
self.time_stamp = time.time()
self.meta = {}
self.stat = ProcStat(path)
self.status = ProcStatus(path)
self.io = ProcIO(path)
self.username = pwd.getpwuid(self.status.uid[0]).pw_name if not self.hasErrors() else 'nobody'
self.pid = int(path.split('/')[-1])
def hasErrors(self):
return self.stat.error or self.status.error or self.io.error
def get_all_proc_data(include_threads=False, root=PROC_FS_ROOT):
if include_threads:
results = [ProcInfo(task_path) for task_path in glob.glob(root + '/[0-9]*/task/[0-9]*')]
else:
results = [ProcInfo(task_path) for task_path in glob.glob(root + '/[0-9]*')]
return [info for info in results if not info.hasErrors()]
def get_proc_info(pid, threadid=None, root=PROC_FS_ROOT):
return ProcInfo(root + '/' + pid + ('task/' + threadid) if threadid else '')
def get_proc_stat(pid, threadid=None, root=PROC_FS_ROOT):
return ProcStat(root + '/' + pid + ('task/' + threadid) if threadid else '')
def get_proc_status(pid, threadid=None, root=PROC_FS_ROOT):
return ProcStatus(root + '/' + pid + ('task/' + threadid) if threadid else '')
def get_proc_io(pid, threadid=None, root=PROC_FS_ROOT):
return ProcIO(root + '/' + pid + ('task/' + threadid) if threadid else '')
if __name__ == '__main__':
usage = """usage: %prog [options] [pid...]
Output CSV for procfs stat, status or io data for given thread/process pid's or
for all processes and threads if no pid's are supplied."""
parser = OptionParser(usage)
parser.add_option('-s', '--stat', action='store_true', dest='do_stat', help='Output csv for pid/stat files.')
parser.add_option('-S', '--status', action='store_true', dest='do_status', help='Output csv for pid/status files.')
parser.add_option('-i', '--io', action='store_true', dest='do_io', help='Output csv for pid/io files.')
parser.add_option('-t', '--titles', action='store_true', dest='output_titles', help='Output a title line.')
parser.add_option('-r', '--repeat', action='store_true', dest='repeat', help='Repeat until interrupted.')
parser.add_option('-w', '--sleep', type='int', dest='wait', default=5, help='Sleep seconds for each repetition.')
parser.add_option('-p', '--processes', action='store_true', dest='processes_only', help='Show all processes, but not threads.')
(options, args) = parser.parse_args()
header = options.output_titles
if len(args) == 0:
args = [ '[0-9]*' ] # match all processes or threads
elif options.processes_only:
print >> sys.stderr, 'ignoring -p, showing requested processes and threads instead.'
options.processes_only = False
while True:
for pid in args:
for path in glob.glob(PROC_FS_ROOT + ('/' if options.processes_only else '/[0-9]*/task/') + pid):
if options.do_stat or (not options.do_status and not options.do_io):
ProcStat(path).csv(sys.stdout, header=header)
if options.do_status:
ProcStatus(path).csv(sys.stdout, header=header)
if options.do_io:
ProcIO(path).csv(sys.stdout, header=header)
header = False
if not options.repeat:
break
time.sleep(options.wait)
</pre><br />
<a name="Notes"><br />
<h3>Notes</h3></a><br />
<br />
Documentation for the Linux proc file-system can be found in the <a href="http://www.kernel.org/doc/man-pages/online/pages/man5/proc.5.html">Linux proc (section 5) manual page </a> (man 5 proc). The files I wanted to parse either contain several lines, with one value per line (status file and io file), or a single line, with multiple values per line (stat file). My linuxprocfs script contains some generalised code that should cope with basic parsing of both types of file and could be a basis for parsing other files in the procfs. The procfs is a prisoner of its history and suffers a bit from inconsistencies in it's syntax.<br />
<br />
The linuxprocfs.py python script is coded for python 2.7 which includes all the dependent modules including the csv and options parsing modules. I'm running an OpenSUSE 11.4 desktop, but I imagine the script will run on any of the modern Linux distributions. The script may issue warnings to standard-error if processes disappear while it is traversing the procfs, these are normal and only diagnostic.<br />
<br />
I have come across one other python interface to procfs, <a href="http://git.kernel.org/?p=linux/kernel/git/acme/python-linux-procfs.git">python-linux-procfs</a> by Arnaldo Carvalho de Melo. It's source base is larger, and it decodes more details. I will be looking into whether anything from my script is worth merging into this other version.<br />
<br />
This post is dusting off some work I'd parked a couple of years back. It's quite pleasant to return to python and its libraries - together they do more to close the gap between idea and implementation than any programming environment I've tired.<br />
<br />
This blog page uses <a href="http://alexgorbatchev.com/">SyntaxHighlighter</a> by Alex Gorbatchev. You can easily copy and paste a line-number free version of the code by selecting view-source icon in the mini-tool-bar that appears on the top right of the source listing (if javascript is enabled). If you click inside a source listing you will be able to use the arrow keys to scroll sideways.digitaltrailshttp://www.blogger.com/profile/16633409983665910035noreply@blogger.com1