Archive for Dev

Android ADB Drivers for Cheap No-Name Tablets

// July 8th, 2012 // 1 Comment » // Dev, Mobile

I’ve recently purchased a Wise Tech Android 4.0 (Ice Cream Sandwich) 7″ tablet from Amazon, for the princely sum of £80 – I opted for the 1.5GHz 1Gb RAM, 16Gb storage version, you can spend less for a lower spec device still quite capable of running ICS. I had three main aims:

  1. Acquire a cheap ICS device for personal development, as the emulator is just too damn slow and I can’t really justify taking work phones home all the time;
  2. Get to know Android 4 as an end user (I use Windows Phone 7 as my main phone platform, an iPad for casual browsing and other tabletty jobs, and an iPod Touch for music but after my earlier experiences I haven’t encountered Android by choice for some time);
  3. Get a spare tablet, for when the iPad is in demand with guests or girlfriend.

The WiseTech tablet handles the second two admirably – for the money, it’s pretty amazingly good. I won’t be stopping using the iPad any time soon for tabletty things when I have a choice of device, but it can handle most things I need it do when needs must. The only bad thing I can say about the hardware is that the screen is fuzzy and a magnet for fingerprints – other than that, the multitouch is responsive, the processor fast, ICS isn’t nearly as bad as the previous versions of Android, and it comes with Google Play (the amiguously renamed Android Market) so you can download any apps you want.

The biggest frustration is the lack of ADB (Android Debug Bridge) drivers, without which it is fairly useless for development. My Windows 7 PC easily recognized the tablet as a USB storage device but refused to recognize the device for ADB and wouldn’t consider the generic ADB driver as appropriate, also refusing to make use of the other “named manufacturer” ADB drivers I already had installed.

Google’s advice is to try downloading the OEM driver from the manufacturer, at which point you hit a slight problem – who is the manufacturer? The box and slim manual are conspicuously unbranded, with no clues to follow, and the device itself only reports some pretty cryptic hardware IDs. Wise Tech seem markedly absent from Google’s OEM list, and also don’t appear on Google search results (part of my motivation for writing this post is to make something appear for those who follow me!).

Figuring that the rival devices I also considered – such famous names as the CloudNine Neuropad, the LB-01, the TabTronics M009s and the Ployer MoMo9 – probably all came from the same no-name Chinese or Taiwanese factory and possibly even the same mould, I started googling those names to and eventually ended up here.

In a nutshell, if you have any manufacture-less cheap Android phone or tablet and you can’t get Windows to recognize any drivers, try installing the PDAnet application and you’ll probably find that it includes an installation of the generic ADB drivers that work for your device. At which point these £60+ tablets really open up a world of very cheap development.

Bootnote: the tablet worked first time when plugged into a Mac. But for the price premium of a Mac, I could have bought a Galaxy Nexus with its working driver set, and used that instead…

MEX Conference 2011

// May 22nd, 2011 // No Comments » // Dev, Mobile, Usability

Slides from my talk at this May’s MEX (Mobile User Experience) conference in London, where I gave the first presentation on the “Efficient UX Techniques for an Age of Network Austerity” pathway:

The slides walk through steps Masabi has taken to minimise dependency on network uptime in our travel apps, and why that matters.

The whole conference was incredibly well put together – props to Marek for that – and encouraged some stimulating debate through it’s unique interactive workshops. Nice food too! Highly recommended to anyone interested in mobile…

Fixing Eclipse Update Issues

// March 2nd, 2010 // No Comments » // Dev, Mobile

After a bit of a break, I’m about to start a stint of Blackberry development and really wanted to try out the new Blackberry JDE integration with Eclipse – something that promises to reduce the immense tedium of running Blackberry simulators somewhat. Anyone who has ever tried to do that will understand how valuable this could be, both financially (time is money after all) and to your sanity.

The plugin requires at least Eclipse 3.4, though, and I was stuck way back on 3.3. Eclipse was reluctant to update itself to any new version from any of the obvious “update” menu items, so I went for the simple brute force method:

  1. Zipping the old Eclipse app folder, then delete it
  2. Download the latest Eclipse, and add the latest version of whatever plugins are needed
  3. Reattach to the old workspace folder.

This initially appeared to work, but didn’t.

Ant Integration

The most visible problem was that Ant builds would no longer run. They’d start, and the red ‘stop’ button on the console would light up (indicating I could stop the running Ant process, not that it was stopped) but no logging at all reached the console. No dialogues appeared explaining the problem.

The clue lay in the workspace’s .metadata/.log file – there were two exceptions, at least one of which was being thrown every time I tried to run Ant:

!ENTRY org.eclipse.core.resources 4 75 2010-03-01 21:17:55.921
!MESSAGE Errors occurred during the build.
!SUBENTRY 1 org.eclipse.mtj.core 2 75 2010-03-01 21:17:55.921
!MESSAGE Errors running builder 'Preverification Builder' on project 'Framework'.
!STACK 1
org.eclipse.core.runtime.CoreException: Build state machine has not been initialized.

or

!ENTRY org.eclipse.ant.ui 4 120 2010-03-01 21:21:16.468
!MESSAGE Error logged from Ant UI:
!STACK 0
java.net.SocketTimeoutException: Accept timed out

Not, admittedly, much of a clue but enough to eventually track down the problem. Ant’s configuration – in particular, the locations of its jars – are stored in your workspace, despite it being a plugin integrated into Eclipse. If the location of Ant’s plugin folder changes, Ant stops working with this workspace.

To fix the problem, go to Preferences > Ant > Runtime. Remove all jars under Ant Home Entries, and then find the new versions in the Eclipse plugin folder (as an External Jar Location). Apply the changes, and your builds shoudl run again.

JavaME Emulation

The JavaME plugin is notoriously bad at introducing breaking changes whenever it updates. This time was no exception – my JavaME projects appeared fine in the IDE, but produced the following exception (to the console, at least) whenever a WTK emulator was run:

Running with storage root C:\Documents and Settings\Tom\j2mewtk\2.5.2\appdb\rms
Running with locale: English_United Kingdom.1252
Running in the identified_third_party security domain
java.lang.ClassNotFoundException: framework/midp/Application
	at com.sun.midp.midlet.MIDletState.createMIDlet(+29)
	at com.sun.midp.midlet.Scheduler.schedule(+52)
	at com.sun.midp.main.Main.runLocalClass(+28)
	at com.sun.midp.main.Main.main(+80)
Execution completed.

The fix turned out to be simple – delete the project, and check it out again. The new version will start with fresh metadata that works with the new plugin. Not very nice, but hardly fatal (if you’re using version control).

Incompatible Plugins

At the end of this, I discovered that the Blackberry JDE plugin does not support the very latest Galileo, so it was all a bit of a pointless exercise. Such is life in mobile development…

WebHarvest: Easy Web Scraping from Java

// February 15th, 2010 // 9 Comments » // Dev, Web

I’ve been experimenting with data visualisation for a while now, most of which is for Masabi‘s business plan though I hope to share some offshoots soon.

I often have a need to quickly scrape some data out of a web page (or list of web pages), which can then be fed into Excel and on to specialist data visualisation tools like Tableau (available in a free public edition here – my initial impressions are positive but it’s early days yet).

To this end I have turned to WebHarvest, an excellent scriptable open source API for web scraping in Java. I really really like it, but there are some quirks and setup issues that have cost me hours so I thought I’d roll together a tutorial with the fixes.

WebHarvest Config for Maven

When it works Maven is a lovely tool to hide dependency management for Java projects, but WebHarvest is not configured qiute right out of the box to work transparently with it. (Describing Maven is beyond the scope of this post, but if you don’t know it, it’s easy to setup with the M2 plugin for Eclipse.)

This is the Maven POM I ended up with to use WebHarvest in a new JavaSE project:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
 <modelVersion>4.0.0</modelVersion>
 <groupId>WebScraping</groupId>
 <artifactId>WebScraping</artifactId>
 <packaging>jar</packaging>
 <version>0.00.01</version>
 <properties>
 <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
 </properties>

 <build>
 <plugins>
 <plugin>
 <artifactId>maven-compiler-plugin</artifactId>
 <configuration>
 <source>1.6</source>
 <target>1.6</target>
 </configuration>
 </plugin>
 </plugins>
 </build>

 <repositories>
 <repository>
 <id>wso2</id>
 <url>http://dist.wso2.org/maven2/</url>
 </repository>
 <repository>
 <id>maven-repository-1</id>
 <url>http://repo1.maven.org/maven2/</url>
 </repository>
 </repositories>
 <dependencies>
 <dependency>
 <groupId>commons-logging</groupId>
 <artifactId>commons-logging</artifactId>
 <version>1.1</version>
 <type>jar</type>
 <scope>compile</scope>
 </dependency>
 <dependency>
 <groupId>log4j</groupId>
 <artifactId>log4j</artifactId>
 <version>1.2.12</version>
 <type>jar</type>
 <scope>compile</scope>
 </dependency>
 <dependency>
 <groupId>org.webharvest.wso2</groupId>
 <artifactId>webharvest-core</artifactId>
 <version>1.0.0.wso2v1</version>
 <type>jar</type>
 <scope>compile</scope>
 </dependency>
 <!-- web harvest pom doesn't track dependencies well -->
 <dependency>
 <groupId>net.sf.saxon</groupId>
 <artifactId>saxon-xom</artifactId>
 <version>8.7</version>
 </dependency>
 <dependency>
 <groupId>org.htmlcleaner</groupId>
 <artifactId>htmlcleaner</artifactId>
 <version>1.55</version>
 </dependency>
 <dependency>
 <groupId>bsh</groupId>
 <artifactId>bsh</artifactId>
 <version>1.3.0</version>
 </dependency>
 <dependency>
 <groupId>commons-httpclient</groupId>
 <artifactId>commons-httpclient</artifactId>
 <version>3.1</version>
 </dependency>
 </dependencies>
</project>

You’ll note that the WebHarvest dependencies had to be added explicitly, because the jar does not come with a working pom listing them.

Writing A Scraping Script

WebHarvest uses XML configuration files to describe how to scrape a site – and with a few lines of Java code you can run any XML configuration and have access to any properties that the script identified from the page. This is definitely the safest way to scrape data, as it decouples the code from the web page markup – so if the site you are scraping goes through a redesign, you can quickly adjust the config files without recompiling the code they pass data to.

The site some good example scripts to show you how to get started, so I won’t repeat them here. The easiest way to create your own is to run the WebHarvest GUI from the command line, start with a sample script, and then hack it around to get what you want – it’s an easy iterative process with good feedback in the UI.

As a simple example, this is a script to go to the Sony-Ericsson developer site’s handset gallery at http://developer.sonyericsson.com/device/searchDevice.do?restart=true, and rip each handset’s individual spec page URI:

<?xml version="1.0" encoding="UTF-8"?>
<config>
	<!-- indicates we want a loop, through the list defined in <list>, doing <body> for each item where the variables uri and i are defined as the index and value of the relevant item -->
	<loop item="uid" index="i">
		<!-- the list section defines what we will loop over - here, it pulls out the value attribute of all option tags -->
		<list>
			<xpath expression="//option/@value">
				<html-to-xml>
					<http url="http://developer.sonyericsson.com/device/searchDevice.do?restart=true"/>
				</html-to-xml>
			</xpath>
		</list>
		<!-- the body section lists instructions which are run for every iteration of the loop -->
		<body>
			<!-- we define a new variable for every iteration, using the iteration count as a suffix  -->
			<var-def name="uri.${i}">
				<!-- template tag is important, else the $ var syntax will be ignored and won't do any value substitutions -->
				<template>device/loadDevice.do?id=${uid}</template>
			</var-def>
		</body>
	</loop>
</config>

The handset URIs will end up in a list of variables, from uri.1 to uri.N.

The XML configuration’s syntax can take a little getting used to – it appeared quite backwards to me at first, but by messing around in the GUI you can experiment and learn pretty fast. With a basic understanding of XPath to identify parts of the web page, and perhaps a little regular expression knowledge to get at information surrounded by plain text, you can perform some very powerful scraping.

We can then define another script which will take this URI, and pull out a piece of information from the page – in this example, it will show the region(s) that the handset was released in:

<?xml version="1.0" encoding="UTF-8"?>
<config>
	<!-- get the entire page -->
	<var-def name="wholepage">
		<html-to-xml>
			<!-- NEVER try and pass in the entire URL as a single variable here! -->
			<http url="http://developer.sonyericsson.com/${uri}"/>
		</html-to-xml>
	</var-def>
	<!-- rip out the block with the specifications -->
	<var-def name="specsheet">
		<xpath expression="//div[@class='phone-specs']">
			<var name="wholepage"/>
			</xpath>
		</var-def>
		<!-- find the handset's name -->
	<var-def name="name">
		<xpath expression="//h5[contains(text(),'Phone Model')]/following-sibling::p[1]/text()">
			<var name="specsheet"/>
			</xpath>
	</var-def>
	<!-- identify the screen resolution -->
	<regexp>
		<regexp-pattern>([\d]*)x([\d]*)</regexp-pattern>
			<regexp-source>
				<xpath expression="//h5[contains(text(),'Screen Sizes')]/following-sibling::p[1]/text()">
					<var name="specsheet"/>
				</xpath>
			</regexp-source>
		<regexp-result>
			<var-def name="screen.width"><template>${_1}</template></var-def>
			<var-def name="screen.height"><template>${_2}</template></var-def>
		</regexp-result>
	</regexp>
</config>

At this point I should note the biggest gotcha with WebHarvest, that just caused me 3 hours of hear tearing. In the script, this line defines the page to scrape: <http url="http://developer.sonyericsson.com/${uri}"/>, where ${uri} is a variable specified at runtime to define a URI. This works.

If you were to substitute in this perfectly sensible alternative: <http url="${url}"/>, you would end up with a completely obscure runtime exception a little like this:

Exception in thread "main" org.webharvest.exception.ScriptException: Cannot set variable in scripter: Field access: bsh.ReflectError: No such field: 1
	at org.webharvest.runtime.scripting.BeanShellScriptEngine.setVariable(Unknown Source)
	at org.webharvest.runtime.scripting.ScriptEngine.pushAllVariablesFromContextToScriptEngine(Unknown Source)
	at org.webharvest.runtime.scripting.BeanShellScriptEngine.eval(Unknown Source)
	at org.webharvest.runtime.templaters.BaseTemplater.execute(Unknown Source)
	at org.webharvest.runtime.processors.TemplateProcessor.execute(Unknown Source)
	at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
	at org.webharvest.runtime.processors.BodyProcessor.execute(Unknown Source)
	at org.webharvest.runtime.processors.VarDefProcessor.execute(Unknown Source)
	at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
	at org.webharvest.runtime.processors.BodyProcessor.execute(Unknown Source)
	at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
	at org.webharvest.runtime.processors.LoopProcessor.execute(Unknown Source)
	at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
	at org.webharvest.runtime.Scraper.execute(Unknown Source)
	at org.webharvest.runtime.Scraper.execute(Unknown Source)
	at scrape.QuickScraper.scrapeUrlList(QuickScraper.java:82)
	at scrape.QuickScraper.scrapeUrlList(QuickScraper.java:49)
	at scrape.ActualScraper.main(DhfScraper.java:37)
Caused by: Field access: bsh.ReflectError: No such field: 1 : at Line: -1 : in file:  : 

	at bsh.UtilEvalError.toEvalError(Unknown Source)
	at bsh.UtilEvalError.toEvalError(Unknown Source)
	at bsh.Interpreter.set(Unknown Source)
	... 18 more

You have been warned!

Running The Scripts From Java

WebHarvest requires very little code to run. I created this little reusable harness class to quickly run the two types of script – one to pull information from a page, and one to farm URLs from which to scrape data. You can use the first without the second, of course.

package scrape;

import java.io.*;
import java.util.*;

import org.apache.commons.logging.*;
import org.webharvest.definition.ScraperConfiguration;
import org.webharvest.runtime.*;
import org.webharvest.runtime.variables.Variable;

/**
 * Quick hackable web scraping class.
 * @author Tom Godber
 */
public abstract class QuickScraper
{
	/** Logging object. */
	protected final Log LOG = LogFactory.getLog(getClass());
	/** Prefix for any variable scraped which defines a URL. It will be followed by a counter. */
	public static final String SCRAPED_URL_VARIABLE_PREFIX = "url.";
	/** A variable name which holds the initial URL to scrape. */
	public static final String START_URL_VARIABLE = "url";

	/** A temporary working folder. */
	private File working = new File("temp");

	/** Ensures temp folder exists.` */
	public QuickScraper()
	{
		working.mkdirs();
	}

	/**
	 * Scrapes a list of URLs which are automatically derived from a page.
	 * The initial URL must be set in the actual URL list config XML.
	 * @param urlConfigXml Path of an XML describing how to scrape the URL list.
	 * @param pageConfigXml Path of an XML describing how to scrape the individual pages found.#
	 * @return The number of URLs processed, or -1 if the config could not be loaded.
	 */
	protected int scrapeUrlList(String urlConfigXml, String pageConfigXml)
	{
		return scrapeUrlList(new HashMap(), urlConfigXml, pageConfigXml);
	}

	/**
	 * Scrapes a list of URLs which are automatically derived from a page.
	 * @param setup Optional configuration for the script
	 * @param urlConfigXml Path of an XML describing how to scrape the URL list.
	 * @param pageConfigXml Path of an XML describing how to scrape the individual pages found.#
	 * @return The number of URLs processed, or -1 if the config could not be loaded.
	 */
	protected int scrapeUrlList(Map setup, String urlConfigXml, String pageConfigXml)
	{
		return scrapeUrlList(setup, new File(urlConfigXml), new File(pageConfigXml));
	}

	/**
	 * Scrapes a list of URLs which are automatically derived from a page.
	 * The initial URL must be set in the actual URL list config XML.
	 * @param urlConfigXml XML describing how to scrape the URL list.
	 * @param pageConfigXml XML describing how to scrape the individual pages found.#
	 * @return The number of URLs processed, or -1 if the config could not be loaded.
	 */
	protected int scrapeUrlList(File urlConfigXml, File pageConfigXml)
	{
		return scrapeUrlList(new HashMap(), urlConfigXml, pageConfigXml);
	}

	/**
	 * Scrapes a list of URLs which are automatically derived from a page.
	 * @param setup Optional configuration for the script
	 * @param urlConfigXml XML describing how to scrape the URL list.
	 * @param pageConfigXml XML describing how to scrape the individual pages found.
	 * @return The number of URLs processed, or -1 if the config could not be loaded.
	 * @throws NullPointerException If the setup map is null.
	 */
	protected int scrapeUrlList(Map setup, File urlConfigXml, File pageConfigXml)
	{
		try
		{
			if (LOG.isDebugEnabled())	LOG.debug("Starting scrape with temp folder "+working.getAbsolutePath()+"...");
			// generate a one-off scraper based on preloaded configuration
			ScraperConfiguration config = new ScraperConfiguration(urlConfigXml);
			Scraper scraper = new Scraper(config, working.getAbsolutePath());
			// initialise any config
			setupScraperContext(setup, scraper);
			// run the script
			scraper.execute();

			// rip the URL list out of the scraped content
			ScraperContext context = scraper.getContext();
			int i=1;
			Variable scrapedUrl;
			if (LOG.isDebugEnabled())	LOG.debug("Scraping performed, pulling URLs '"+SCRAPED_URL_VARIABLE_PREFIX+"n' from "+context.size()+" variables, starting with "+i+"...");
			while ((scrapedUrl = (Variable) context.get(SCRAPED_URL_VARIABLE_PREFIX+i))  != null)
			{
				if (LOG.isTraceEnabled())	LOG.trace("Found "+SCRAPED_URL_VARIABLE_PREFIX+i+": "+scrapedUrl.toString());
				// parse this URL
				setup.put(START_URL_VARIABLE, scrapedUrl.toString());
				scrapeUrl(setup, pageConfigXml);
				// move on
				i++;
			}
			if (LOG.isDebugEnabled())	LOG.debug("No more URLs found.");
			return i;
		}
		catch (FileNotFoundException e)
		{
			if (LOG.isErrorEnabled())	LOG.error("Could not find config file '"+urlConfigXml.getAbsolutePath()+"' - no scraping was done for this WebHarvest XML.", e);
			return -1;
		}
		finally
		{
			working.delete();
		}
	}

	/**
	 * Scrapes an individual page, and passed the results on for processing.
	 * The script must contain a hardcoded URL.
	 * @param configXml XML describing how to scrape an individual page.
	 */
	protected void scrapeUrl(File configXml)
	{
		scrapeUrl((String)null, configXml);
	}

	/**
	 * Scrapes an individual page, and passed the results on for processing.
	 * @param url The URL to scrape. If null, the URL must be set in the config itself.
	 * @param configXml XML describing how to scrape an individual page.
	 */
	protected void scrapeUrl(String url, File configXml)
	{
		Map setup = new HashMap();
		if (url!=null)	setup.put(START_URL_VARIABLE, url);
		scrapeUrl(setup, configXml);
	}

	/**
	 * Scrapes an individual page, and passed the results on for processing.
	 * @param setup Optional configuration for the script
	 * @param configXml XML describing how to scrape an individual page.
	 */
	protected void scrapeUrl(Map setup, File configXml)
	{
		try
		{
			if (LOG.isDebugEnabled())	LOG.debug("Starting scrape with temp folder "+working.getAbsolutePath()+"...");
			// generate a one-off scraper based on preloaded configuration
			ScraperConfiguration config = new ScraperConfiguration(configXml);
			Scraper scraper = new Scraper(config, working.getAbsolutePath());
			setupScraperContext(setup, scraper);
			scraper.execute();

			// handle contents in some way
			pageScraped((String)setup.get(START_URL_VARIABLE), scraper.getContext());

			if (LOG.isDebugEnabled())	LOG.debug("Page scraping complete.");
		}
		catch (FileNotFoundException e)
		{
			if (LOG.isErrorEnabled())	LOG.error("Could not find config file '"+configXml.getAbsolutePath()+"' - no scraping was done for this WebHarvest XML.", e);

		}
		finally
		{
			working.delete();
		}
	}

	/**
	 * @param setup Any variables to be set before the script runs.
	 * @param scraper The object which does the scraping.
	 */
	private void setupScraperContext(Map setup, Scraper scraper)
	{
		if (setup!=null)
			for (String key : setup.keySet())
				scraper.getContext().setVar(key, setup.get(key));
	}

	/**
	 * Process a page that was scraped.
	 * @param url The URL that was scraped.
	 * @param context The contents of the scraped page.
	 */
	public abstract void pageScraped(String url, ScraperContext context);
}

Scraping a new set of data then becomes as simple as extending the class, passing in appropriate config, and pulling out whatever variables you want every time a page is scraped:

package scrape;

import org.webharvest.runtime.ScraperContext;
import org.webharvest.runtime.variables.Variable;

public class ActualScraper extends QuickScraper
{
	public static void main(String[] args)
	{
		try
		{
			ActualScraper scraper = new ActualScraper();
			// do the scraping
			scraper.scrapeUrlList(config, "config/se.urls.xml", "config/se.page.xml");
		}
		catch (Exception e)
		{
			e.printStackTrace();
		}
	}

	/**
	 * @see scrape.QuickScraper#pageScraped(java.lang.String, org.webharvest.runtime.ScraperContext)
	 */
	public void pageScraped(String url, ScraperContext context)
	{
		Variable nameVar = context.getVar("name");
		if (nameVar==null)
		{
			if (LOG.isWarnEnabled())	LOG.warn("Scrape for "+url+" produced no data! Ignoring");
			return;
		}

		// store this station's details
		if (LOG.isInfoEnabled())	LOG.info(name+" has "+context.getVar("screen.width").toString()+"x"+context.getVar("screen.height").toString()+" screen");
	}
}

Soi there you have it – a powerful, configurable and highly effective web scraping system with almost no code written!

Migrating Blogger to WordPress – Easy 301 Permalink Redirects

// October 25th, 2009 // 1 Comment » // Dev, Web

I’ve been moving the Masabi web site and blog onto WordPress, from a combination of static web content and a blog driven by Blogger. WordPress has a great import function to move the posts across, which does most of the initial work for you.

However, WordPress won’t by itself set up redirects for the old Blogger to new WordPress permalinks. The two platforms shrink post titles to URLs differently, so it’s not as simple as matching WordPress permalink structures to Blogger’s under the Settings.

I did see one plugin which was supposed to migrate Blogger permalinks automatically, but it didn’t work and also didn’t cover the full scope I needed – I also have legacy static html links to remap into a totally different site structure. To achieve this I turned to the excellent Redirection plugin from John Godley.

Getting Inside The Database

The plugin allows you to manually set up redirects with a very friendly interface, but there’s no fun migrating 70 blog posts by hand.  This is where SQL can come to our rescue!

VERY IMPORTANT: take a full backup of your database before you start messing around with SQL.  In theory this is a pretty low risk operation, but, you never know!

The Blogger import utility saves custom fields for every imported post; the Blogger permalink is held in the blogger_permalink field; custom fields are stored on the post_meta table with an obvious ID based backlink to the original table.  This makes one half of the mapping very easy to set up.

The other half is slightly more subtle, because WordPress permalinks are not actually stored on the wp_posts table. Every post does have a GUID, but this is based on what its permalink was when you did the import – and if you imported when you created the blog and set your WordPress permalinks later, this will not reflect the post’s current permalink URI. Ideally we’d like the 301 to point to the real end URI, so we need to get a little creative and rebuild the permalink in the way WordPress does it, from the post metadata.

My permalink structure looks like this:
YYYY/MM/DD/title/

This can be rebuilt using the following string manipulation in SQL:
CONCAT('/',YEAR(post_date),'/',LPAD(MONTH(post_date),2,'0'),'/',LPAD(DAY(post_date),2,'0'),'/',post_name,'/')

Notes on the SQL functions:

  • CONCAT just combines all of its arguments together into a single string;
  • LPAD is used to pad the left of the string with 0s, as the month and day are always 2 digits long;
  • YEAR, MONTH and DAY extract the relevant fields from the post’s creation date/time.

Given this data, we can easily create an automatic import SQL statement for moving the data across:

INSERT INTO wp_redirection_items (url,action_data,regex,group_id,status,action_type,action_code,match_type,last_access,position)
SELECT M.meta_value AS url,CONCAT('/',YEAR(P.post_date),'/',LPAD(MONTH(P.post_date),2,'0'),'/',LPAD(DAY(P.post_date),2,'0'),'/',P.post_name,'/') AS action_data,0 AS regex,1 AS group_id,'enabled' AS status,'url' AS action_type,301 AS action_code,'url' AS match_type, 0 AS last_access, 69 as position
FROM wp_postmeta M, wp_posts P
WHERE M.meta_key='blogger_permalink' AND M.post_id=P.ID AND P.post_status='publish';

Run this through PHPMyAdmin, refresh the Redirection admin page, and you should now find that all of your permalinks have been moved across. Note that we set the position field to an arbitrary constant, here 69, so we can easily delete the inserted rows if we messed up and then try again, without upsetting any other redirects already set up.

Running Subversion Through Ant

// May 28th, 2009 // No Comments » // Dev

Ant has a somewhat limited built-in Subversion task.  An alternative if you need more power (and I can’t quite remember why I did, but I did) is Subclipse’s ant task – but setting it up isn’t so obvious, especially if you don’t use Subclipse as your main Subversion plugin.

The Ant task provides a Java wrapper around two different ways to access Subversion: either a command line ‘svn’ command, or the JavaHL DLL.  Initially I opted for the former (using this Windows client) which appeared to work passably well, but generated reams of logging that slowed commits to a crawl; the Ant task didn’t allow you to pass additional command line parameters on to the command, so there wasn’t much that could be done about it.

An obscure bug, probably something to do with a slightly corrupted Subversion project in our repository, finally led me to reassess today, and after an afternoon of fun I scrapped the Windows client and went after the DLL.

To get it (and for some reason it took me a while to think of this), download the latest ‘update site’ zip for the latest Subclipse Eclipse plugin (currently 1.6.2), open it and pull out the four jar files with names starting org.tigris.subversion.clientadapter.  Note: don’t be distracted by the Update URL, which is the conventional way to actually install the plugin.

Drop these four jars in some suitable lib folder, and make sure they are added to Ant’s runtime classpath – in Eclipse, you do it from the Window > Preferences dialogue under Ant > Runtime:

Eclipse Ant dialogue

You can add them to the Ant Home or Global sections, it works either way.

Our entire automated tagging system now runs vastly quicker without the logging, and so far has been bug free even where the Windows client was balking about some off metadata thing (which is the only thing it didn’t adequately log, so no easy way to fix it). Excellent.

Masabists: Cell IDs and Location-Based Services

// December 1st, 2007 // Comments Off // Dev, Mobile

Note: this was originally posted on the Masabists blog.

It was very interesting to read about the latest update to Google Maps, one of the nicest J2ME apps around at the moment, which can now find your location without GPS. My instant reaction was – “Finally! But will the operators let them continue?”

Current operator location services work by triangulating signal strength from multiple base stations, which can often give good accuracy in urban areas densely packed with cells. They carry with them a cost – low in absolute terms but sadly quite high for a lot of possible use cases – and all sort of privacy controls, which whilst clearly necessary have been a bit of a barrier to widespread adoption of Location-Based Services.

Back in I think it was 2002, Masabi had a working system to track handset location by cell IDs. Ben, being an engineer at heart, had strapped a modem unit to a Palm PDA and written an application to read out the current cell ID and plot it onto GIF maps downloaded live from StreetMap.co.uk. I distinctly remember being very impressed walking down Victoria Street towards Parliament Square in Westminster and seeing it track us across the map on this very GIF with surprising accuracy.

Consensus seems to be that Google are using a very similar system, with GPS users providing location data to map out operator’s cell IDs (something I believe explicitly mentioned). This suggests that Google haven’t purchased the location data from the operators. Why would that matter?

So how did we build up our cell location database? And if it worked, why didn’t we commercialise it? The two answers are connected – we were ramping up for a launch within certain industries which could have benefitted from a single network/limited device range service. Unfortunately – or perhaps fortunately, with hindsight – just before a major demo, the operator we were using decided to remove the cell broadcast info that had been supplying the base station OS grid reference locations (note: the cell IDs themselves did not appear to change, as I had erroneously stated earlier).

We considered some sort of effort to map cell IDs into a database, perhaps open source, but without widespread GPS ownership this was a huge task and there was no guarantee that the operators wouldn’t choose to change the IDs at any time in the future and we were not interested in trying to make commercial promises where we had no control over key components. So we put it to rest.

Some JavaME devices can access the current cell ID, as can signed Symbian apps and Windows Mobile apps; Google’s compatibility list suggests they are targeting only these devices, suggesting they are attempting something similar. I wish them luck!

Please comment on the  original post.