Archive for Web

WordPress iPad App – Failed Login Problem

// December 7th, 2010 // 2 Comments » // Web

After a considerable time failing to add this blog to the WordPress iPad app, I finally spotted a comment on a forum pointing out that it requires XML-RPC to work.  So if you are having the same intensely frustrating rejections every time you try and add your site, you fix it like this:

  1. Log in to the admin console via the web
  2. Click on Writing (under Settings)
  3. At the bottom of the page, check the XML-RPC checkbox and hit Save.

Now when you start up the app it should be able to authenticate you.

This is the kind of bug that is pretty unneccessary, and will doubtless put off a lot of new users.  How hard is it to point out such a basic requirement whenever the app fails to find an XML-RPC uri active on the site?

After 2 mins of using the app I’ve been dumped out by crashes twice and find a lot of  the setup, but will persevere as I’d love to be able to draft blog posts on the bus home and save them offline…

Verified by Visa

// February 25th, 2010 // No Comments » // Web

Plenty of people have commented on the stupidity of Verified by Visa and similar schemes, which put iframed verification forms into the web purchasing system that look remarkably like phishing forms (encouraging users to trust such embedded forms), without providing any additional security benefits. A quick glance at the economics explains why sites do this – it enables them to lower their costs by shunting fraud risk onto Visa – but from a user’s perspective it’s still bloody stupid.

Even worse is when you cannot complete a payment without it, but it doesn’t work – an experience I have just had with BA, trying to book a flight over to Queen’s Day in Amsterdam. After entering all my details, I got this:

The bank would like the following information… an empty iframe. It’s actually loading a JSP on BA’s site which delivers an empty HTML page wrapping a script that tries to trigger a form that isn’t defined in the markup. Knowing that is no great consolation…

So congratulations BA, Easyjet were undoubtedly very happy to receive some cash in exchange for a functional web experience.

WebHarvest: Easy Web Scraping from Java

// February 15th, 2010 // 9 Comments » // Dev, Web

I’ve been experimenting with data visualisation for a while now, most of which is for Masabi‘s business plan though I hope to share some offshoots soon.

I often have a need to quickly scrape some data out of a web page (or list of web pages), which can then be fed into Excel and on to specialist data visualisation tools like Tableau (available in a free public edition here – my initial impressions are positive but it’s early days yet).

To this end I have turned to WebHarvest, an excellent scriptable open source API for web scraping in Java. I really really like it, but there are some quirks and setup issues that have cost me hours so I thought I’d roll together a tutorial with the fixes.

WebHarvest Config for Maven

When it works Maven is a lovely tool to hide dependency management for Java projects, but WebHarvest is not configured qiute right out of the box to work transparently with it. (Describing Maven is beyond the scope of this post, but if you don’t know it, it’s easy to setup with the M2 plugin for Eclipse.)

This is the Maven POM I ended up with to use WebHarvest in a new JavaSE project:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
 <modelVersion>4.0.0</modelVersion>
 <groupId>WebScraping</groupId>
 <artifactId>WebScraping</artifactId>
 <packaging>jar</packaging>
 <version>0.00.01</version>
 <properties>
 <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
 </properties>

 <build>
 <plugins>
 <plugin>
 <artifactId>maven-compiler-plugin</artifactId>
 <configuration>
 <source>1.6</source>
 <target>1.6</target>
 </configuration>
 </plugin>
 </plugins>
 </build>

 <repositories>
 <repository>
 <id>wso2</id>
 <url>http://dist.wso2.org/maven2/</url>
 </repository>
 <repository>
 <id>maven-repository-1</id>
 <url>http://repo1.maven.org/maven2/</url>
 </repository>
 </repositories>
 <dependencies>
 <dependency>
 <groupId>commons-logging</groupId>
 <artifactId>commons-logging</artifactId>
 <version>1.1</version>
 <type>jar</type>
 <scope>compile</scope>
 </dependency>
 <dependency>
 <groupId>log4j</groupId>
 <artifactId>log4j</artifactId>
 <version>1.2.12</version>
 <type>jar</type>
 <scope>compile</scope>
 </dependency>
 <dependency>
 <groupId>org.webharvest.wso2</groupId>
 <artifactId>webharvest-core</artifactId>
 <version>1.0.0.wso2v1</version>
 <type>jar</type>
 <scope>compile</scope>
 </dependency>
 <!-- web harvest pom doesn't track dependencies well -->
 <dependency>
 <groupId>net.sf.saxon</groupId>
 <artifactId>saxon-xom</artifactId>
 <version>8.7</version>
 </dependency>
 <dependency>
 <groupId>org.htmlcleaner</groupId>
 <artifactId>htmlcleaner</artifactId>
 <version>1.55</version>
 </dependency>
 <dependency>
 <groupId>bsh</groupId>
 <artifactId>bsh</artifactId>
 <version>1.3.0</version>
 </dependency>
 <dependency>
 <groupId>commons-httpclient</groupId>
 <artifactId>commons-httpclient</artifactId>
 <version>3.1</version>
 </dependency>
 </dependencies>
</project>

You’ll note that the WebHarvest dependencies had to be added explicitly, because the jar does not come with a working pom listing them.

Writing A Scraping Script

WebHarvest uses XML configuration files to describe how to scrape a site – and with a few lines of Java code you can run any XML configuration and have access to any properties that the script identified from the page. This is definitely the safest way to scrape data, as it decouples the code from the web page markup – so if the site you are scraping goes through a redesign, you can quickly adjust the config files without recompiling the code they pass data to.

The site some good example scripts to show you how to get started, so I won’t repeat them here. The easiest way to create your own is to run the WebHarvest GUI from the command line, start with a sample script, and then hack it around to get what you want – it’s an easy iterative process with good feedback in the UI.

As a simple example, this is a script to go to the Sony-Ericsson developer site’s handset gallery at http://developer.sonyericsson.com/device/searchDevice.do?restart=true, and rip each handset’s individual spec page URI:

<?xml version="1.0" encoding="UTF-8"?>
<config>
	<!-- indicates we want a loop, through the list defined in <list>, doing <body> for each item where the variables uri and i are defined as the index and value of the relevant item -->
	<loop item="uid" index="i">
		<!-- the list section defines what we will loop over - here, it pulls out the value attribute of all option tags -->
		<list>
			<xpath expression="//option/@value">
				<html-to-xml>
					<http url="http://developer.sonyericsson.com/device/searchDevice.do?restart=true"/>
				</html-to-xml>
			</xpath>
		</list>
		<!-- the body section lists instructions which are run for every iteration of the loop -->
		<body>
			<!-- we define a new variable for every iteration, using the iteration count as a suffix  -->
			<var-def name="uri.${i}">
				<!-- template tag is important, else the $ var syntax will be ignored and won't do any value substitutions -->
				<template>device/loadDevice.do?id=${uid}</template>
			</var-def>
		</body>
	</loop>
</config>

The handset URIs will end up in a list of variables, from uri.1 to uri.N.

The XML configuration’s syntax can take a little getting used to – it appeared quite backwards to me at first, but by messing around in the GUI you can experiment and learn pretty fast. With a basic understanding of XPath to identify parts of the web page, and perhaps a little regular expression knowledge to get at information surrounded by plain text, you can perform some very powerful scraping.

We can then define another script which will take this URI, and pull out a piece of information from the page – in this example, it will show the region(s) that the handset was released in:

<?xml version="1.0" encoding="UTF-8"?>
<config>
	<!-- get the entire page -->
	<var-def name="wholepage">
		<html-to-xml>
			<!-- NEVER try and pass in the entire URL as a single variable here! -->
			<http url="http://developer.sonyericsson.com/${uri}"/>
		</html-to-xml>
	</var-def>
	<!-- rip out the block with the specifications -->
	<var-def name="specsheet">
		<xpath expression="//div[@class='phone-specs']">
			<var name="wholepage"/>
			</xpath>
		</var-def>
		<!-- find the handset's name -->
	<var-def name="name">
		<xpath expression="//h5[contains(text(),'Phone Model')]/following-sibling::p[1]/text()">
			<var name="specsheet"/>
			</xpath>
	</var-def>
	<!-- identify the screen resolution -->
	<regexp>
		<regexp-pattern>([\d]*)x([\d]*)</regexp-pattern>
			<regexp-source>
				<xpath expression="//h5[contains(text(),'Screen Sizes')]/following-sibling::p[1]/text()">
					<var name="specsheet"/>
				</xpath>
			</regexp-source>
		<regexp-result>
			<var-def name="screen.width"><template>${_1}</template></var-def>
			<var-def name="screen.height"><template>${_2}</template></var-def>
		</regexp-result>
	</regexp>
</config>

At this point I should note the biggest gotcha with WebHarvest, that just caused me 3 hours of hear tearing. In the script, this line defines the page to scrape: <http url="http://developer.sonyericsson.com/${uri}"/>, where ${uri} is a variable specified at runtime to define a URI. This works.

If you were to substitute in this perfectly sensible alternative: <http url="${url}"/>, you would end up with a completely obscure runtime exception a little like this:

Exception in thread "main" org.webharvest.exception.ScriptException: Cannot set variable in scripter: Field access: bsh.ReflectError: No such field: 1
	at org.webharvest.runtime.scripting.BeanShellScriptEngine.setVariable(Unknown Source)
	at org.webharvest.runtime.scripting.ScriptEngine.pushAllVariablesFromContextToScriptEngine(Unknown Source)
	at org.webharvest.runtime.scripting.BeanShellScriptEngine.eval(Unknown Source)
	at org.webharvest.runtime.templaters.BaseTemplater.execute(Unknown Source)
	at org.webharvest.runtime.processors.TemplateProcessor.execute(Unknown Source)
	at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
	at org.webharvest.runtime.processors.BodyProcessor.execute(Unknown Source)
	at org.webharvest.runtime.processors.VarDefProcessor.execute(Unknown Source)
	at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
	at org.webharvest.runtime.processors.BodyProcessor.execute(Unknown Source)
	at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
	at org.webharvest.runtime.processors.LoopProcessor.execute(Unknown Source)
	at org.webharvest.runtime.processors.BaseProcessor.run(Unknown Source)
	at org.webharvest.runtime.Scraper.execute(Unknown Source)
	at org.webharvest.runtime.Scraper.execute(Unknown Source)
	at scrape.QuickScraper.scrapeUrlList(QuickScraper.java:82)
	at scrape.QuickScraper.scrapeUrlList(QuickScraper.java:49)
	at scrape.ActualScraper.main(DhfScraper.java:37)
Caused by: Field access: bsh.ReflectError: No such field: 1 : at Line: -1 : in file:  : 

	at bsh.UtilEvalError.toEvalError(Unknown Source)
	at bsh.UtilEvalError.toEvalError(Unknown Source)
	at bsh.Interpreter.set(Unknown Source)
	... 18 more

You have been warned!

Running The Scripts From Java

WebHarvest requires very little code to run. I created this little reusable harness class to quickly run the two types of script – one to pull information from a page, and one to farm URLs from which to scrape data. You can use the first without the second, of course.

package scrape;

import java.io.*;
import java.util.*;

import org.apache.commons.logging.*;
import org.webharvest.definition.ScraperConfiguration;
import org.webharvest.runtime.*;
import org.webharvest.runtime.variables.Variable;

/**
 * Quick hackable web scraping class.
 * @author Tom Godber
 */
public abstract class QuickScraper
{
	/** Logging object. */
	protected final Log LOG = LogFactory.getLog(getClass());
	/** Prefix for any variable scraped which defines a URL. It will be followed by a counter. */
	public static final String SCRAPED_URL_VARIABLE_PREFIX = "url.";
	/** A variable name which holds the initial URL to scrape. */
	public static final String START_URL_VARIABLE = "url";

	/** A temporary working folder. */
	private File working = new File("temp");

	/** Ensures temp folder exists.` */
	public QuickScraper()
	{
		working.mkdirs();
	}

	/**
	 * Scrapes a list of URLs which are automatically derived from a page.
	 * The initial URL must be set in the actual URL list config XML.
	 * @param urlConfigXml Path of an XML describing how to scrape the URL list.
	 * @param pageConfigXml Path of an XML describing how to scrape the individual pages found.#
	 * @return The number of URLs processed, or -1 if the config could not be loaded.
	 */
	protected int scrapeUrlList(String urlConfigXml, String pageConfigXml)
	{
		return scrapeUrlList(new HashMap(), urlConfigXml, pageConfigXml);
	}

	/**
	 * Scrapes a list of URLs which are automatically derived from a page.
	 * @param setup Optional configuration for the script
	 * @param urlConfigXml Path of an XML describing how to scrape the URL list.
	 * @param pageConfigXml Path of an XML describing how to scrape the individual pages found.#
	 * @return The number of URLs processed, or -1 if the config could not be loaded.
	 */
	protected int scrapeUrlList(Map setup, String urlConfigXml, String pageConfigXml)
	{
		return scrapeUrlList(setup, new File(urlConfigXml), new File(pageConfigXml));
	}

	/**
	 * Scrapes a list of URLs which are automatically derived from a page.
	 * The initial URL must be set in the actual URL list config XML.
	 * @param urlConfigXml XML describing how to scrape the URL list.
	 * @param pageConfigXml XML describing how to scrape the individual pages found.#
	 * @return The number of URLs processed, or -1 if the config could not be loaded.
	 */
	protected int scrapeUrlList(File urlConfigXml, File pageConfigXml)
	{
		return scrapeUrlList(new HashMap(), urlConfigXml, pageConfigXml);
	}

	/**
	 * Scrapes a list of URLs which are automatically derived from a page.
	 * @param setup Optional configuration for the script
	 * @param urlConfigXml XML describing how to scrape the URL list.
	 * @param pageConfigXml XML describing how to scrape the individual pages found.
	 * @return The number of URLs processed, or -1 if the config could not be loaded.
	 * @throws NullPointerException If the setup map is null.
	 */
	protected int scrapeUrlList(Map setup, File urlConfigXml, File pageConfigXml)
	{
		try
		{
			if (LOG.isDebugEnabled())	LOG.debug("Starting scrape with temp folder "+working.getAbsolutePath()+"...");
			// generate a one-off scraper based on preloaded configuration
			ScraperConfiguration config = new ScraperConfiguration(urlConfigXml);
			Scraper scraper = new Scraper(config, working.getAbsolutePath());
			// initialise any config
			setupScraperContext(setup, scraper);
			// run the script
			scraper.execute();

			// rip the URL list out of the scraped content
			ScraperContext context = scraper.getContext();
			int i=1;
			Variable scrapedUrl;
			if (LOG.isDebugEnabled())	LOG.debug("Scraping performed, pulling URLs '"+SCRAPED_URL_VARIABLE_PREFIX+"n' from "+context.size()+" variables, starting with "+i+"...");
			while ((scrapedUrl = (Variable) context.get(SCRAPED_URL_VARIABLE_PREFIX+i))  != null)
			{
				if (LOG.isTraceEnabled())	LOG.trace("Found "+SCRAPED_URL_VARIABLE_PREFIX+i+": "+scrapedUrl.toString());
				// parse this URL
				setup.put(START_URL_VARIABLE, scrapedUrl.toString());
				scrapeUrl(setup, pageConfigXml);
				// move on
				i++;
			}
			if (LOG.isDebugEnabled())	LOG.debug("No more URLs found.");
			return i;
		}
		catch (FileNotFoundException e)
		{
			if (LOG.isErrorEnabled())	LOG.error("Could not find config file '"+urlConfigXml.getAbsolutePath()+"' - no scraping was done for this WebHarvest XML.", e);
			return -1;
		}
		finally
		{
			working.delete();
		}
	}

	/**
	 * Scrapes an individual page, and passed the results on for processing.
	 * The script must contain a hardcoded URL.
	 * @param configXml XML describing how to scrape an individual page.
	 */
	protected void scrapeUrl(File configXml)
	{
		scrapeUrl((String)null, configXml);
	}

	/**
	 * Scrapes an individual page, and passed the results on for processing.
	 * @param url The URL to scrape. If null, the URL must be set in the config itself.
	 * @param configXml XML describing how to scrape an individual page.
	 */
	protected void scrapeUrl(String url, File configXml)
	{
		Map setup = new HashMap();
		if (url!=null)	setup.put(START_URL_VARIABLE, url);
		scrapeUrl(setup, configXml);
	}

	/**
	 * Scrapes an individual page, and passed the results on for processing.
	 * @param setup Optional configuration for the script
	 * @param configXml XML describing how to scrape an individual page.
	 */
	protected void scrapeUrl(Map setup, File configXml)
	{
		try
		{
			if (LOG.isDebugEnabled())	LOG.debug("Starting scrape with temp folder "+working.getAbsolutePath()+"...");
			// generate a one-off scraper based on preloaded configuration
			ScraperConfiguration config = new ScraperConfiguration(configXml);
			Scraper scraper = new Scraper(config, working.getAbsolutePath());
			setupScraperContext(setup, scraper);
			scraper.execute();

			// handle contents in some way
			pageScraped((String)setup.get(START_URL_VARIABLE), scraper.getContext());

			if (LOG.isDebugEnabled())	LOG.debug("Page scraping complete.");
		}
		catch (FileNotFoundException e)
		{
			if (LOG.isErrorEnabled())	LOG.error("Could not find config file '"+configXml.getAbsolutePath()+"' - no scraping was done for this WebHarvest XML.", e);

		}
		finally
		{
			working.delete();
		}
	}

	/**
	 * @param setup Any variables to be set before the script runs.
	 * @param scraper The object which does the scraping.
	 */
	private void setupScraperContext(Map setup, Scraper scraper)
	{
		if (setup!=null)
			for (String key : setup.keySet())
				scraper.getContext().setVar(key, setup.get(key));
	}

	/**
	 * Process a page that was scraped.
	 * @param url The URL that was scraped.
	 * @param context The contents of the scraped page.
	 */
	public abstract void pageScraped(String url, ScraperContext context);
}

Scraping a new set of data then becomes as simple as extending the class, passing in appropriate config, and pulling out whatever variables you want every time a page is scraped:

package scrape;

import org.webharvest.runtime.ScraperContext;
import org.webharvest.runtime.variables.Variable;

public class ActualScraper extends QuickScraper
{
	public static void main(String[] args)
	{
		try
		{
			ActualScraper scraper = new ActualScraper();
			// do the scraping
			scraper.scrapeUrlList(config, "config/se.urls.xml", "config/se.page.xml");
		}
		catch (Exception e)
		{
			e.printStackTrace();
		}
	}

	/**
	 * @see scrape.QuickScraper#pageScraped(java.lang.String, org.webharvest.runtime.ScraperContext)
	 */
	public void pageScraped(String url, ScraperContext context)
	{
		Variable nameVar = context.getVar("name");
		if (nameVar==null)
		{
			if (LOG.isWarnEnabled())	LOG.warn("Scrape for "+url+" produced no data! Ignoring");
			return;
		}

		// store this station's details
		if (LOG.isInfoEnabled())	LOG.info(name+" has "+context.getVar("screen.width").toString()+"x"+context.getVar("screen.height").toString()+" screen");
	}
}

Soi there you have it – a powerful, configurable and highly effective web scraping system with almost no code written!

Portfolio: Corzano e Paterno.com

// January 26th, 2010 // No Comments » // Web

A trilingual WordPress site built for the Corzano e Paterno farm in Tuscany, producing prize winning wines, olive oil and cheeses. It’s an old farm in the hills that used to be owned by the Machievellis, which now has a lovely set of guest houses you can rent through the site.

Corzano e Paterno site - wine page

Picasa photo album integration:

Corzano e Paterno site - example photo album

The site features a blog, press clippings, downloads of publications which have featured the farm’s wines, and other news:

Corzano e Paterno site - news page

This is integrated into a database of each year’s wines and cheeses:

Corzano e Paterno site - cheese page

Portfolio: Masabi Rebrand

// December 10th, 2009 // No Comments » // Mobile, Web

My company has gone through a complete transition over the second half of 2009, moving from a general mobile application consultancy to a product-based transport ticketing vendor. This new focus merited a total branding overhaul as our old look, with its black background, was more appropriate for our legacy marketing and gaming background.

The new font and colour scheme were designed to evoke the feel of the old British Rail branding, whilst the logo resembles the front of an Intercity train:

Masabi's new logo - The Ticket Machine In Your Pocket

The new tagline – “The Ticket Machine In Your Pocket” – came out of a brainstorming session during the excellent g2i (Gateway to Investment) course we took part in, which I would highly recommend to anyone interested in grooming their company for funding, or just understanding when a startup needs funds and what to expect from investors. It’s sponsored by the London Development Authority but run by industry professionals, offering top quality advice and opportunities where all participant’s interests are aligned – far better than the fee-based ‘advice’ and ‘connections’ that are so easy to come by.

The front page embeds a video of the product in action which really explains the underlying concept nicely – the photos I took during the video shoot now form a great resource of imagery for company documents and presentations:

The site structure is intentionally simple: it features simple product tours aimed at Passengers and Train Operating Companies:

The news section manages press releases and external coverage, alongside a social media feed integrating the company’s Flickr, Twitter, YouTube and SlideShare channels:

There is also a live feed showing the next event Masabi will be presenting at driven by our Google-based events calendar, with an integrated view on the site:

The company blog was migrated over from the Blogger account of the old site; a redirect plugin was set up to ensure legacy URLs continued to work:

The site also has all the obvious bells and whistles like Google Maps integration to find the office, and directions from the nearest tube stations etc:

Migrating Blogger to WordPress – Easy 301 Permalink Redirects

// October 25th, 2009 // 1 Comment » // Dev, Web

I’ve been moving the Masabi web site and blog onto WordPress, from a combination of static web content and a blog driven by Blogger. WordPress has a great import function to move the posts across, which does most of the initial work for you.

However, WordPress won’t by itself set up redirects for the old Blogger to new WordPress permalinks. The two platforms shrink post titles to URLs differently, so it’s not as simple as matching WordPress permalink structures to Blogger’s under the Settings.

I did see one plugin which was supposed to migrate Blogger permalinks automatically, but it didn’t work and also didn’t cover the full scope I needed – I also have legacy static html links to remap into a totally different site structure. To achieve this I turned to the excellent Redirection plugin from John Godley.

Getting Inside The Database

The plugin allows you to manually set up redirects with a very friendly interface, but there’s no fun migrating 70 blog posts by hand.  This is where SQL can come to our rescue!

VERY IMPORTANT: take a full backup of your database before you start messing around with SQL.  In theory this is a pretty low risk operation, but, you never know!

The Blogger import utility saves custom fields for every imported post; the Blogger permalink is held in the blogger_permalink field; custom fields are stored on the post_meta table with an obvious ID based backlink to the original table.  This makes one half of the mapping very easy to set up.

The other half is slightly more subtle, because WordPress permalinks are not actually stored on the wp_posts table. Every post does have a GUID, but this is based on what its permalink was when you did the import – and if you imported when you created the blog and set your WordPress permalinks later, this will not reflect the post’s current permalink URI. Ideally we’d like the 301 to point to the real end URI, so we need to get a little creative and rebuild the permalink in the way WordPress does it, from the post metadata.

My permalink structure looks like this:
YYYY/MM/DD/title/

This can be rebuilt using the following string manipulation in SQL:
CONCAT('/',YEAR(post_date),'/',LPAD(MONTH(post_date),2,'0'),'/',LPAD(DAY(post_date),2,'0'),'/',post_name,'/')

Notes on the SQL functions:

  • CONCAT just combines all of its arguments together into a single string;
  • LPAD is used to pad the left of the string with 0s, as the month and day are always 2 digits long;
  • YEAR, MONTH and DAY extract the relevant fields from the post’s creation date/time.

Given this data, we can easily create an automatic import SQL statement for moving the data across:

INSERT INTO wp_redirection_items (url,action_data,regex,group_id,status,action_type,action_code,match_type,last_access,position)
SELECT M.meta_value AS url,CONCAT('/',YEAR(P.post_date),'/',LPAD(MONTH(P.post_date),2,'0'),'/',LPAD(DAY(P.post_date),2,'0'),'/',P.post_name,'/') AS action_data,0 AS regex,1 AS group_id,'enabled' AS status,'url' AS action_type,301 AS action_code,'url' AS match_type, 0 AS last_access, 69 as position
FROM wp_postmeta M, wp_posts P
WHERE M.meta_key='blogger_permalink' AND M.post_id=P.ID AND P.post_status='publish';

Run this through PHPMyAdmin, refresh the Redirection admin page, and you should now find that all of your permalinks have been moved across. Note that we set the position field to an arbitrary constant, here 69, so we can easily delete the inserted rows if we messed up and then try again, without upsetting any other redirects already set up.

E-Mail Not Working In PHP / WordPress? It May Be CPanel MX Records…

// October 20th, 2009 // No Comments » // Web

I host a few domains through Nativespace and Host Gator (which both use the CPanel interface), and all of them have had one flaw – PHP apps like WordPress, Drupal, and also WordPress plugins like the Dagon Design Mailer Script have never been able to send email.  It’s quite hard to track down this kind of error yourself on a managed host with a web UI so in the past I’d not worried about it, but finally it came time to solve it (or rather get Ben to solve it, but I’ll write about it so others don’t have to go through the hard work)

The Problem: CPanel’s Broken MX Assumptions

Note: you can ignore this section if you don’t care why the problem occurs, and just want a fix!

It transpires that the problem occurs when the domain being hosted is registered through some other provider, which therefore runs the domain’s DNS and MX records.  MX records tell the world how your domain runs its e-mail.

Rather than do a proper MX lookup for sending e-mail, CPanel just assumes it runs the domain’s MX records and looks up in its own local configuration.  This would be fine if it did run the domain’s e-mail, but in these instances it doesn’t and therefore the e-mail ends up falling into a black hole.  Why the authors chose to do this is uncertain, but it is easy to fix.

The Fix: Redundanct Configuration

First, find the MX Entry icon in the Mail section of the CPanel front page:

CPanel's MX Entry icon

Click on it to see CPanel’s configuration for your domain.  The default looks something like this:

CPanel's default MX configuration

Many of my domains use the free Google Apps product to offer GMail accounts that work with the domain (ie. a GMail inbox tied to an @masochismtango.com address).  Here is the CPanel configuration you need if your domain does the same – if not, you’ll need to check your MX records with your domain registrar to find the right configuration to enter here.

Warning: this solution only works if you know the real MX records for the domain! If you don’t you could mess things up.  Remember it may be easier to just file a support ticket – I’m not taking responsibility if you mess this up!

First, add the correct MX configuration for the domain:

MX configuration for a domain with Google Apps GMail

For GMail, you can get away with just adding a couple of entries, at priority 1 and 5. Then, delete the old priority 0 entry which CPanel started with to leave just the real configuration:

MX configuration for a domain with Google Apps GMail

Now you’re done – hopefully your emails should instantly start working!

Portfolio: KeenCity.tv web site

// September 13th, 2009 // 1 Comment » // Web

I’ve just finished a quick hack of Typebased, a free WordPress theme from Woo Themes, to nicely frame Youtube videos for KeenCity.tv:

KeenCity.tv web site

We decided that starting from someone else’s theme was the only way to get a site up and running on a shoestring budget.

The header photo is one I took from the top of Centrepoint at dawn, carefully edited to remove any iconic London buildings and convey the sense of an unknown city coming to life.

Mobile Webapps – iUI Framework Extensions

// August 3rd, 2009 // 10 Comments » // Mobile, Web

As I explained in my earlier Masabists post, I’m finally able to show off some of the work I’ve been doing recently on local mobile webapps.  We’ve based our webapps on the rather excellent iUI framework, which has a great philosophy:

  • Clean, standard HTML markup;
  • Very lightweight – one CSS stylesheet and a single Javascript file;
  • Built-in AJAX support, so form values can be submitted and HTML fragment responses pasted into the document structure;
  • iPhone native-style look and feel, including slick screen to screen transitions and screen rotation support.

The screens are all embedded inside a single HTML file, which has two advantages:

  1. You get very rapid movement around the app, with no need for a reliable network connection and none of the slowdown associated with downloading every page;
  2. You can cache the app using an HTML 5 manifest, so even when the user has no network signal they can access the site.

This is great as far as it goes, but there were a number of screen types not supported.  I have added an optional extension script, following the same philosophy, to add them – this ost is just a reference point to describe what the extensions do, when discussing merging them into the framework.

Structure

The extensions are in a seperate script and CSS file, which creates the window.iui_ext object, much like the core iUI framework itself.

To work, iUI has been extended in one key way – it now fires off Javascript events to screens, specifically:

  • onFocus() is called whenever a screen is made visible;
  • onBlur() is called whenever a screen is left.

Both events are called at the point where the link is clicked and the sliding transition begins.

Adding the events gives the developer a huge amount of extra scope for controlling the webapp, which is then used to create the additional webapps without breaking away from clean markup.

Form Handling

The iUI extensions automatically add focus and blur event handlers for every form, which store all field values in a name/value store whenever a form is left and restore the values whenever a form is visited again.  These are added inside the iUI code, and once finished they will call any explicitly defined onFocus="..." / onBlur="..." attributes within the markup.

The map follows the same basic semantics as the HTML 5 sessionStore object, which I wanted to use but sadly it isn’t implemented on the iPhone yet (the latest desktop Safari seems to have it though).  The methods are exposed through the iui_ext object, for example window.iui_ext.getItem('key').

The purpose of this store is to allow the movement of values between screens, which enables a number of tricks – by declaring inputs on seperate forms (displayed as seperate screens) with the same name attribute, you can make sure they always mirror each other’s values.  If for any reason you don’t want this behaviour for some fields, you can declare an onBlur event and remove them from the store.

Option Lists

I needed a form screen which listed a number of fields which could be changes, such as railcard type in the screenshot below, and by tapping on the field the user would move to a list of all the possible options from which they could select one.  A few examples of this UI pattern from my iPod Touch Settings app are:

  • Settings > Music > Audiobook Speed (list of 3 options)
  • Settings > Video > Start Playing (list of 2 options)
  • Settings > Safari > Accept Cookies

Visually, it looks like this:

iPhone option selection UI widget

I implemented in two stages. Firstly, I generated CSS to render a radio input (with label) as a block with the tick if it is selected, using a bit of custom Safari CSS (-khtml-appearance:none;) to suppress the native style of a checkbox – which interestingly changes from relatively nice on an iPod Touch v2 to quite nasty on an iPhone 3GS.

Next up, I tried to think of a natural way to express the options in HTML, and ended up with the idea of getting a script to rewrite any select in a form which has the CSS class of ‘panel’ into a seperate linked panel.  The markup you write looks like this:

<div class="row">
 <label for="buy_railcard">Railcard</label>
 <select id="buy_railcard" name="railcard" class="panel">
  <option value="rc_none" selected>None</option>
  <option value="rc_youth">Youth</option>
  <option value="rc_family">Family</option>
  <option value="rc_senior">Senior</option>
  <option value="rc_disabled">Disabled</option>
  <option value="rc_network">Network</option>
  <option value="rc_hm">HM Services</option>
 </select>
</div>

By default, it would render like this (taken from desktop Safari, just to make my life easier):

Selects as they would render in Safari without the script

The script rewrites the above HTML on the original form like this:

<div class="row">
 <a href="#select__buy_railcard">Railcard
  <input type="hidden" name="railcard" value="rc_none"/>
  <var id="railcard-rc_none" class="_lookup rc_none">None</var>
 </a>
</div>

Points to note:

  • The original form will submit in exactly the same way to the server
    • the new hidden field within the link has the same name as the select and the value of the selected option.
  • The textual label of the selected option is copied into the var tag, which actually visually shows the selected option to the user
    • the var is also given the option value as a css class, which allows us to do the funky icons on all labels if we want (using :before generated content, in this case).

The script also creates a new form.panel screen containing the options as a radio group, like this:

<form id="select__buy_railcard" class="panel" title="Railcard">
 <fieldset class="radiogroup">
  <div class="row">
   <label class="rc_none" for="buy_railcard_option_rc_none">None
    <input id="buy_railcard_option_rc_none" type="radio" value="rc_none" name="railcard"/>
   </label>
  </div>
  <div class="row">
   <label class="rc_youth" for="buy_railcard_option_rc_youth">Youth
    <input id="buy_railcard_option_rc_youth" type="radio" value="rc_youth" name="railcard"/>
   </label>
  </div>
  <div class="row">
   <label class="rc_family" for="buy_railcard_option_rc_family">Family
    <input id="buy_railcard_option_rc_family" type="radio" value="rc_family" name="railcard"/>
   </label>
  </div>
  <div class="row">
   <label class="rc_senior" for="buy_railcard_option_rc_senior">Senior
    <input id="buy_railcard_option_rc_senior" type="radio" value="rc_senior" name="railcard"/>
   </label>
  </div>
  <div class="row">
   <label class="rc_disabled" for="buy_railcard_option_rc_disabled">Disabled
    <input id="buy_railcard_option_rc_disabled" type="radio" value="rc_disabled" name="railcard"/>
   </label>
  </div>
  <div class="row">
   <label class="rc_network" for="buy_railcard_option_rc_network">Network
    <input id="buy_railcard_option_rc_network" type="radio" value="rc_network" name="railcard"/>
   </label>
  </div>
  <div class="row">
   <label class="rc_hm" for="buy_railcard_option_rc_hm">HM Services
    <input id="buy_railcard_option_rc_hm" type="radio" value="rc_hm" name="railcard"/>
   </label>
  </div>
  </fieldset>
</form>

Date Selection

One of the greatest things in HTML 5, to me, is also one of the simplest – the input tag has now been extended to allow all sorts of new types, such as dates, telephone numbers and even colours. The idea is that browsers will then handle these specially with funky calendar drop downs etc that curretly have to be implemented in Javascript – and on mobiles, they can use a more appropriate native solution optimised for the keypad, with address book access etc.

Obviously, no mobile browser has bothered to implement these – today you’ll have to look at Opera desktop if you want to see a browser starting to do it correctly.

In the absence of a native widget, I implemented a date selector myself.  On the iPhone itself, this is implemented in a style similar to HTML drop-downs in Safari – difficult to achieve in HTML alone, and not actually that natural for selecting travel dates.  I opted to support date selection with a calendar reminiscent of theiPhone calendar app’s instead:

Date selection with iUI extension

As before, this is implemented with a standard piece of HTML that is rewritten by the script.  To add a calendar you add an input like this to your form:

<div class="row">
 <label for="buy_travelDate">Outbound Date</label>
 <input name="travelDate" id="buy_travelDate" type="date" class="date" value="1/1/2009">
</div>

The type you choose to assign can be either date or text.  If you set the input’s type to text by hand, the script will always replace it with a custom calendar.  Currently, Safari rewrites any type it doesn’t understand to text when the page loads; if and when date support is added, this will presumably cease and the script will stop handling dates for you and leave selection to the browser.  Your choice which you want!

When the form is first loaded, your markujp is rewritten like this:

<div class="row">
 <a href="#select__buy_journeytype">Journey Type
  <input type="hidden" name="journeytype" value="tt_single"/>
  <var id="journeytype-tt_single" class="_lookup tt_single">Single</var>
 </a>
</div>

There will also be a (singleton) extra screen created for the date picker like this:

<form id="_datepicker" class="panel">
 <p>
  <span id="_dpback" class="back"> </span>
  <span id="_dpmonth" class="month"/>
  <input id="_dphidden" type="hidden"/>
  <span id="_dpfwd" class="fwd"> </span>
 </p>
 <table id="_dptable" cellspacing="0" cellpadding="0" border="0">
  <colgroup>
   <col class="sun"/>
   <col class="mon"/>
   <col class="tue"/>
   <col class="wed"/>
   <col class="thu"/>
   <col class="fri"/>
   <col class="sat"/>
  </colgroup>
  <thead>
   <tr class="days">
    <th>Sun</th>
    <th>Mon</th>
    <th>Tue</th>
    <th>Wed</th>
    <th>Thu</th>
    <th>Fri</th>
    <th>Sat</th>
   </tr>
  </thead>
  <tbody>
   <tr class="wk1">
    <td> </td>
    <!-- etc -->
   </tr>
   <tr class="wk2">...</tr>
   <tr class="wk3">...</tr>
   <tr class="wk4">...</tr>
   <tr class="wk5">...</tr>
   <tr class="wk6">...</tr>
  </tbody>
 </table>
</form>

The screen is a singleton, which means it will only be created once, and reused for every date picker in your app (which saves on memory).  Month and day names are taken from arrays which could be externalised and internationalised easily if required (CSS class names on cols are hardcoded though).

Every time the date picker is selected, the script refreshes the calendar, working out the date from the hidden field on the original form and copying over the relevant details.  Table cell contents are updated with the relevant date, and onClick events set to make selections work.

There are a few options, which can be passed in using CSS classes applied to the date field:

  1. If the today class is specified, then any selection of today’s date will be stored on the form (and shown visually) as ‘Today’ instead of the date;
  2. If the future class is specified, only today and future dates are visible and selectable;
  3. If the past class is specified, only today and past dates are visible and selectable;
  4. A date pattern can be specified using the following syntax:
    • Parts of the date:
      • d for the date eg. 28
      • D for the day’s abbreviated name eg. Mon or Fri
      • m for the month’s number eg. 1 (for January)
      • M for the month’s abbreviated name eg. Jan or Dec
      • y for the year as two digits eg. 09
      • Y for the year as four digits eg. 2009
    • Other symbols:
      • _ underscore is converted to a space
      • c is converted to a comma
      • Hyphens and slashes are also allowed.
    • Examples:
      • d/m/y goes to 3/8/09
      • d-M-y goes to 3-Aug-09
      • Dc_d_M_Y goes to Mon, 3 Aug 2009

If a date format is set, then only that format can be used to specify the value of the field.

That’s basically it for my extensions – having documented them, I’ll now see if anyone wants them merged in to the main iUI framework!

Multilingual WordPress Sites

// June 7th, 2009 // No Comments » // Web

Over the last month I’ve been building a tri-lingual site for an excellent little farm in Tuscany, Corzano e Paterno, where among other things they make prize winning cheeses, olive oil and wines (all of which I recommend trying whilst staying in one of their villas – really, it’s great!)

The one requirement that caused me the biggest headache was the need to handle three languages.  There are quite a few plugins which can handle two, but only a handful which manage three or more and almost all aren’t actually that easy to use and/or reliable.  In the end, after much experimentation, I settled on the excellent qTranslate which really beat all of the others hands down. Highly recommended.

qTranslate integrates nicely into the WordPress page and post admin UI, offering multiple header fields and extra tabs for each language you want to add:

qTranslate WordPress plugin's Admin UI

For standard functions, everything is great and works seamlessly.  However if you need to directly access the contents of a field in a post/page object, things will look a bit weird:

<?php echo($single->post_title); ?>

will look on the page like:

EnglishItalianoDeutsch

generating the following actual markup:

<!--:en-->English<!--:--><!--:it-->Italiano<!--:--><!--:de-->Deutsch<!--:-->

So what to do? Well, fortunately, we can just hijack a function hat qTranslate uses itself to resolve the current translation for any field – replace the :

<?php echo(qtrans_useCurrentLanguageIfNotFoundUseDefaultLanguage($single->post_title)); ?>

It’s that simple – wrap that function around any multilingual property you need to read, and qTranslate will do all the heavy lifting for you!