“Programming Collective Intelligence”: Python, data mining, machine learning and a little more…
Simply put: “Programming Collective Intelligence” is one of the most outstanding publications related to IT and software development I’ve been reading in a while. Given some of our business use case, at the moment I am a little deeper into dealing with analyzing (and, subsequently) making decisions and suggestions out of data somehow linked to users in our environment (for the obvious reason of both making our work a little easier and making our users overall experience a little better), and browsing the table of content of this book made it seem worth a closer look. And, overally, after having a closer look, I was about to find out that this book indeed offers profound information on the issue I am dealing with – and way more beyond this scope…
Some say the only way of telling whether you understood some aspect of theory is whether or not you’re capable of easily explaining it to someone else. If that’s true, Toby Segaran, the author of this book, for sure has a pretty clear idea what he’s talking / writing about: Throughout the chapters, he manages to explain algorithms and concepts in an astoundingly short, concise and yet coherent way, laying out essential things in a few lines of text, a small set of images and sample code, providing you with all you need to get you going. So if you’re out to learn a thing or two about “programming collective intelligence”, about clustering, seeking, ranking, machine learning and a whole load of other interesting things, this book is generally an excellent starting point. Same applies if you like to do learning following a straightforwards “hands-on”, providing you with working examples and fundamental information to build upon whenever you feel the need to go more into detail here and there.
But there’s more. Asides being a good introduction to a specialized field of algorithms and approaches, as a “side effect” you also dive into doing a lot of things using the Python programming language, including fetching and parsing RSS feeds, automatically processing HTML files, generating visual output using Python Imaging, … . Along the way, you learn how to do file i/o, you get quite some exercise in using list comprehension and in working with matrix representations, and, which in my opinion seems one of the best effects to have: You get confronted with writing re-usable algorithms and data structures, either by writing exchangeable functions doing “the same in a different way” on a given data set, or by transforming input data of whichever kind into a given data structures to re-use functions defined to work with these data structures before.
All these things, as said, same as the actual explanations on the books core subject, come as concise, very “hands-on” explanations, ready and there to get your feet wet quickly, to try out, to build upon, to include whatever you learnt into your own applications. It’s a book making one curious to play with the example code, to figure out how to solve different problems with the approaches learnt here, it’s a book which, while reading, made me more than once immediately think of at least a handful of things to be done with the examples in there. There’s always more detail information if more knowledge is needed on the aspects and topics outlined there, but as a brief, usable introduction, it can’t be done better in my opinion.