The Amazon Kindle: wonderful device, shame about the books

Sat, 02 Jul 2011 20:08

The only time I tend to have to read is when I'm travelling to and from work. Unfortunately, it's rather inconvenient for me to have to remove and replace the book from and to my rucksack. This sometimes also leads to me damaging the book in question. Hence, my decision to purchase an Amazon Kindle.

The Kindle is certainly a nice device. It's reasonably light, and despite having a resolution of only 600x800, the properties of electronic ink and a pixel density of 167 ppi result in an impressive display. The user-interface can be a little clunky, but this is only an issue when you're doing something with the Kindle other than flipping the pages of a book.

For my first read, I decided I wanted a complete collection of all Sherlock Holmes novels and short stories written by Sir Arthur Conan Doyle. Surely, not an unreasonable request since "The Adventures of Sherlock Holmes" (the first of four collections of stories) is prominently featured by Amazon as one of the free classics available for the Kindle.

My first choice was to download them from Project Gutenberg and avoid any DRM-related issues. Alas, none of these had any images (many stories were originally illustrated by Sidney Paget when first published in The Strand magazine). Hence, I decided to purchase them from Amazon. I expected that it would be trivial to find a complete collection including original illustrations, ideally typeset for the Kindle, and not subject to typographical errors. I couldn't have been more wrong.

Finding such a book in paper form isn't an issue. However, searching for an equivalent ebook led me to reviews complaining about missing images, missing contents pages, missing lines, poorly typeset conversations, poorly scanned images and adverts for other books embedded within the text1. Note that most of these editions were non-free and it would be necessary to buy the ebook before spotting the issues, and again to get a revised version (if one was produced in the future).

The first problem is that searching Amazon for Sherlock Holmes ebooks reveals many, many, results and it's extremely difficult to distinguish between them. Amazon's site aggregates the reviews of all editions of a book together, regardless of publisher. As a result, it's rather difficult to find reviews for a specific Kindle edition of a book. Multiple Amazon customers seemed confused by which ebook edition was being described by which review. Unfortunately, if the quality of ebooks varies massively across editions, finding reviews for a specific edition becomes altogether more important.

The next problem is the lack of trustworthy publishers. I had hoped that I might be able to make a guess about the quality of the ebook based on its publisher. Publishers of physical books tend to have well defined reputations but typing many of the ebook publishers into a search engine revealed no other sign of their existence. Other books were simply listed as public domain or had no publisher at all. One publisher's website had vanished entirely. Of almost all the publishers I could find, there was no indication that they'd existed for more than a year or two.

This is perhaps, hardly surprising. In the UK, all the Sherlock Holmes novels and short stories have passed into the public domain. It doesn't take much work to imagine a business model to become an ebook publisher selling only classics.

  • Download out of copyright works from Project Gutenberg or other public domain resources.
  • Edit away undesired text and modify typography.
  • Create persuasive description and cover page, then sell on Amazon.

The best part is that creating an ebook has no physical printing cost. Of course, I have no way to know whether this is happening. What I do know is that the reviews suggest some extremely shoddy publishers. Given the difficulty of checking for issues before purchase and no way to return or get refunded for an ebook, it's probably rather easy to make a pretty penny.

However, there is spark of hope. An illustrated edition by Seashell Press seems to have had an unparalleled amount of effort placed into its creation and received excellent reviews. It's probably also worth noting that this is the only publisher I've seen that also appears to produce physical books.

Alas, it looks like this collection was originally complete, but then was revised to not contain "The Case-book of Sherlock Holmes" as it was still under copyright in the US. This change was also applied to the version available in the UK as well. I've since emailed the publishers to see if it is possible to get the original version distributed in the UK, where copyright of that particular collection is not an issue.

Amazon's use of DRM, remote content removal and the choice not to support EPUB indicate how much they wish to retain control of Kindle ebook distribution. As such, the quality of ebooks available from the Kindle store directly affects the utility of the Kindle, and it's sad to see issues like these that I never anticipated.

These are simply first impressions, and I have no idea if the issues I've described affect other books. If they do, they probably only exist for for older, out of copyright works. However, one of Amazon's main selling points for the Kindle seems to be that it provides an acceptable experience for reading classic literature. Just look at the Kindle's display every time it's switched off.


  1. For example, this free edition of The Adventures of Sherlock Holmes, currently at #3 in the Free Kindle store, contains character encoding errors. One such error is in the story "Adventure II. The Red-Headed League" which contains the text "there is now another vacancy open which entitles a member of the League to a salary of �4 a week for purely nominal services". The three strange characters should instead be a pound sign.

    In this non-free edition of "The Complete Sherlock Holmes Collection", I found that the links in the table of contents for the stories "The Adventure of a Case of Identity" and "The Adventure of the Read-Headed League" pointed to each other's stories instead of their own. It seems free of more major issues though. 

tags: , | permanent link to this entry

Random Slowness with Western Digital Caviar Green Hard Drives

Thu, 14 Oct 2010 02:28

Several months ago, my MythTV server started acting strangely. At seemingly random intervals, the system would experience high IO, completely crippling the responsiveness of the system to the extent that it would take minutes to even establish an SSH connection to the machine. There appeared to be no way to trigger the problem and running my own high-IO tasks showed no issues. The high-IO periods lasted for at least several minutes but sometimes occurred days apart, and as they wrecked the machine's responsiveness, it was also practically impossible to monitor the system when the symptoms did appear.

Over the months, I tried changing the root file-system, searching for known IO issues with the Linux kernel, XFS or MythTV, checking fragmentation, swap-utilisation, IO-priority settings and running a multitude of different kernels including low-latency versions, all to no avail. The times I managed to SSH into the machine during a high-IO attack, running top and latencytop were equally unrevealing. It was only recently that I finally discovered the culprit: the Western Digital Caviar Green 1.5TB hard drive.

It appears that the drive randomly enters periods where IO radically slows for periods of minutes, then recovers and behaves normally until the next one. There are no other signs that anything else is wrong with the drive such as strange noises, bad sectors or SMART warnings. The drive model in question is a 'WDC WD15EADS-00P8B0' and is only just over a year old which means the symptoms must have started manifesting not long after purchase.

As the problem isn't reproducible on demand, I can't give solid IO performance figures. What I can state is that when the problem has manifested at boot time, I've given up waiting after tens of minutes for a system that usually boots in less that one. I can't imagine that extensive amounts of data are read during boot, which makes me wonder if the problem might be due to seeking rather than actual read speeds. Either way, the result is an unusable system.

I've found postings on these issues here, here and here which indicate the issues affect only specific models. As far as I can tell, Western Digital don't seem to have acknowledged any issues with this drive, nor provided any sort of firmware update that might fix it. As the symptoms don't match conventional signs of drive-failure (it's unclear if the problem is even physical), I can only imagine how many others are experiencing these issues without realising the cause.

I'm not going to condemn Western Digital drives outright. The 2.5" 'WDC WD3200BEVT-35ZCT1' SATA drive in my laptop has been absolutely fine and is also the first laptop hard-drive I've owned that is so quiet that I actually need to look at the activity LED to judge IO load. However, I find it disappointing that Western Digital seems to have done so little to acknowledge what is clearly an issue affecting a number of people.

Now let's see if I can get Western Digital to give me a RMA code.

tags: ,, | permanent link to this entry

Debian OpenSSL Vulnerability Still Pains Two Years On

Tue, 17 Aug 2010 17:58

In April 2006, someone filed a bug report with Debian complaining that OpenSSL (an open source SSL/TLS implementation) read data from an uninitialised buffer. This was causing Valgrind (a brilliant debugging tool) to report memory usage warnings anywhere the OpenSSL random number generator was used. This was actually the behaviour intended by the OpenSSL developers, as the uninitialised buffer was being fed into an entropy pool used by the random number generator.

A fix was proposed that reset the uninitialised buffer to zero before first use. This would have had minimal security implications as OpenSSL used multiple reliable entropy sources in addition to the buffer. However, the initial patch to do this didn't stop all the Valgrind warnings. Somehow the proposed fix mutated into one that caused almost no entropy to be added to the pool except for the process ID, rendering the random number generator almost entirely useless. This in turn led OpenSSL to create extremely vulnerable SSL certificates, SSH keys and a bunch of other things. The broken OpenSSL version made it into Debian in September 2006. It later propagated into Ubuntu.

Incredibly, no one noticed the vulnerability until 2008. By then, a massive number of packages and users had been affected. Cue scrambling to fix the bug in Debian and Ubuntu, creation of blacklists and scanners for vulnerable OpenSSL and OpenSSH keys, arguments about who was to blame and how open source practices could have failed so badly. Also, cartoons.

Yes, I did feel the need to reiterate the entire bug's history. Reading the bug report is like watching a slow-motion car crash. Even the "fix" for the initial bug was applied incorrectly, which meant it appeared in Debian version 0.9.9c-1 of OpenSSL (released 17 September) instead of 0.9.8b-1 (released 4 May). I suppose that was a good thing.

Even now, over two years after the bug was discovered and almost four years after it was originally introduced, the damage it caused is still being discovered. The Electronic Frontier Foundation, launching a project into all publicly used SSL certificates has had to delay because they are disclosing vulnerabilities to websites they found using weak private keys generated by the broken Debian/Ubuntu OpenSSL versions.

As the EFF SSL Observatory page will be updated at some point, it currently reads:

This project is not fully launched yet, because we are currently engaging in vulnerability disclosure for around 28,000 websites that we observed to be using extremely weak private keys generated by a buggy version of OpenSSL.

Further information can be found in the SSL Observatory DEFCON 18 slides. Of the 28K vulnerable certificates seen, the 530 validating ones are the most interesting. The others were either invalid or the 12K issued by private certificate authorities. Only 73 of the 530 valid certificates had been revoked. In particular none of the 140 valid certificates by Equifax had been revoked, and only 4 of the 125 issued by Cybertrust.

In conclusion, this all seems rather depressing. A bug in a patch to an important open source security library went unnoticed for two years, despite reducing of the effective security of the keys it generated to almost nothing (15 bits). Furthermore, a bunch of private "trusted" companies still haven't taken measures to ensure SSL certificates they generated using this library have been marked invalid another two years after the bug was found.

On a positive note, I'm sure Debian has learnt from its mistakes by now, but it would still be nice to find some policy document to show what has changed (I failed to find one). Rather unsettlingly it seems like this was the exact type of bug the Debian Security Audit Project was set up to spot.

Unfortunately, nothing so positive can be said about the state of Certificate Authorities. Until one is sued for not taking proper security precautions, their behavior is unlikely to change. As for security, there's nothing to stop a CA from collaborating with a government or other entity that wants to eavesdrop on communications (as the EFF warns). Also, no amount of money spent on an SSL certificate from even the most trustworthy CA will protect against a rogue certificate created (or successfully forged) from a different CA also trusted by the web browser. In other words, SSL is nowhere near as trustworthy as you think.

tags: ,,,, | permanent link to this entry

Debian Packaging for TrueCrypt

Fri, 20 Nov 2009 23:45

I like to be able to build Debian packages from source. Unfortunately, the released TrueCrypt sources don't contain the Debian/Ubuntu packaging used to build their .deb files. As a result, I've created my own Debian packaging, available here.

As the TrueCrypt license seems to have issues that stop TrueCrypt being included in most distributions, the downloadable files only include the 'debian' folder. You'll still need to download the TrueCrypt sources from here. Advantages over the upstream packaging include an init script to kill TrueCrypt device mappings on shutdown and the generation of a reasonably well formatted man page from the output of 'truecrypt --help'.

tags: ,, | permanent link to this entry

DNS resolution delays in Debian (and Ubuntu)

Sat, 23 May 2009 16:38

Yesterday, I was pinging a server when I noticed that the output of ping seemed to be rather slow. In fact, I'd noticed it before but never really thought about it until I was pinging anther server at the same time and the saw the drastic difference in output speeds.

Pinging google.co.uk, there was a ping every second:

fpr@callisto:~$ ping google.co.uk | perl -ne 'use Time::Format; print "$time{\"hh:mm:ss.mmm\"} - $_"'
14:44:34.898 - PING google.co.uk (74.125.77.104) 56(84) bytes of data.
14:44:34.915 - 64 bytes from ew-in-f104.google.com (74.125.77.104): icmp_seq=1 ttl=238 time=32.9 ms
14:44:35.883 - 64 bytes from ew-in-f104.google.com (74.125.77.104): icmp_seq=2 ttl=238 time=35.9 ms
14:44:36.882 - 64 bytes from ew-in-f104.google.com (74.125.77.104): icmp_seq=3 ttl=238 time=33.4 ms

On another server, it was closer to five:

fpr@callisto:~$ ping server1.fsckvps.com | perl -ne 'use Time::Format; print "$time{\"hh:mm:ss.mmm\"} - $_"'
14:49:42.389 - PING server1.fsckvps.com (66.71.248.146) 56(84) bytes of data.
14:49:42.408 - 64 bytes from 66.71.248.146: icmp_seq=1 ttl=45 time=122 ms
14:49:47.500 - 64 bytes from 66.71.248.146: icmp_seq=2 ttl=45 time=123 ms
14:49:52.625 - 64 bytes from 66.71.248.146: icmp_seq=3 ttl=45 time=122 ms

This didn't make much sense since the round-trip times were small by comparison and ping sends one request per second by default. A Google search indicated that the problem might lie with my resolv.conf file. Unfortunately, mine seemed to be fine and my local DNS server was completely responsive. However, if I pinged the server by IP address instead of by hostname, the delay was gone.

To see what was blocking, I ran strace on ping.

15:04:26.789 - munmap(0x7f69ed319000, 129482)          = 0
15:04:26.790 - socket(PF_FILE, SOCK_STREAM, 0)         = 4
15:04:26.790 - fcntl(4, F_GETFD)                       = 0
15:04:26.790 - fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
15:04:26.790 - connect(4, {sa_family=AF_FILE, path="/var/run/avahi-daemon/socket"...}, 110) = 0
15:04:26.790 - fcntl(4, F_GETFL)                       = 0x2 (flags O_RDWR)
15:04:26.790 - fstat(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
15:04:26.790 - mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f69ed359000
15:04:26.791 - lseek(4, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
15:04:26.791 - write(4, "RESOLVE-ADDRESS 66.71.248.146\n"..., 30) = 30
15:04:31.789 - read(4, "-15 Timeout reached\n"..., 4096) = 20
15:04:31.789 - close(4)                                = 0

It was blocking each ping on a read from /var/run/avahi-daemon/socket, a socket used by Avahi, an implementation of the network auto-configuration and service discovery mechanism Zeroconf that augments DNS. Killing the Avahi daemon solved the problem, but I still wanted to work out why ping was talking to Avahi and why it only occurred on certain hosts, so I ran ltrace:

15:14:33.062 - gettimeofday(0x7fff90ec0550, NULL)               = 0
15:14:33.062 - gettimeofday(0x7fff90ec0520, NULL)               = 0
15:14:33.062 - memcpy(0x00608988, "\311\004\030J", 16)          = 0x00608988
15:14:33.062 - sendmsg(3, 0x6077c0, 2048, 2, 61091)             = 64
15:14:33.184 - recvmsg(3, 0x7fff90ec15f0, 0, 0, 61091)          = 84
15:14:38.193 - gethostbyaddr("BG\370\222T\315(\002", 4, 2)      = NULL
15:14:38.193 - inet_ntoa(0x92f84742)                            = "66.71.248.146"
15:14:38.194 - strcpy(0x006078e0, "66.71.248.146")              = 0x006078e0

I then wrote a little test in C just to check that I could replicate the delay with gethostbyaddr(). I could, and it was then that I finally realised that the delay occurred when pinging hosts that had no PTR record (reverse DNS). Slightly confusingly, ping performs a reverse DNS lookup on the IP address when provided with a hostname, but not when given an IP address.

gethostbyaddr() was calling Avahi because it had plugged itself into the glibc resolver using NSS. When an attempt to resolve an IP address to a hostname failed, glibc would then call Avahi to try to find it. For whatever reason, Avahi cannot answer this request instantly and times out after 5 long seconds. Avahi also resolves host names to IP addresses but the delay in looking up unresolvable host names only occurs if the domain is under the .local pseudo top-level domain.

The Debian package dependencies make it a bit difficult to remove Avahi so the easiest way to fix this is to remove references to mdns4 from /etc/nsswitch.conf. If you want to kill the daemon entirely then you can always disable its init script of course.

Amusingly (or tragically) this bug is listed in Ubuntu and Debian bug reports which are both over two years old. At time time of writing, it still exists in Ubuntu Jaunty and Debian testing. What makes me really angry is that somewhere, someone decided that it would be a great idea to enable this daemon by default on desktop installs and as a result, performance of applications is being degraded. Obviously ping doesn't matter that much, but as mentioned in the bug reports, this hits people using ssh and IMAP as well, causing anything from mild delays to almost unusable systems. The worst part of this is that many people (and there could be a lot) suffering these issues are probably attributing them to a slow network, or packet loss, or ssh key verification or just about anything else other than a dubiously designed daemon running on their own machine. The fact that it only occurs on certain hosts only reinforces this and unless they suddenly realise that this delay is their local machine's fault and put a lot of effort into debugging, they'll probably never know.

tags: ,,,, | permanent link to this entry