Moving from WordPress to Middleman, Part I: ActiveRecord modeling of WordPress by jgn on Friday, January 30, 2015 in Technology, Ruby, and Middleman

Awhile back I decided to migrate my blog from WordPress to Middleman.

Why? A couple of reasons. I don't need a VPS anymore, and I wanted the stability and performance of a static site as provided by Middleman (or Jekyll, etc.). I've always enjoyed having a VPS of some kind lying around. I had my system on SliceHost. But SliceHost was acquired by RackSpace, and while the service was still OK, I was getting sick of paying the $20/month or so to keep it going, esp. since if I needed a remote system for debugging, I could fire up a cheap DigitalOcean VM for a few minutes. About the only reason I can think of to keep the old VPS around was for tunneling NetFlix outside of the USA; but I think the next time I need to do that, I'll pay for a VPN.

There are a variety of tools out there for getting data out of WordPress, but I was pretty unimpressed with them. For one thing, a lot of them go after the XML; but I prefer extracting the data directly from the database. Additionally, I had some peculiar problems. For example, I wanted to convert the WordPress "highlight" code markup to Markdown fenced blocks. It turned out as well that getting the URLs perfect (to not have to define a bunch of redirects) would be tricky. Finally, Middleman only allows one "category" per article, if you do categories the way the docs suggest (with a custom collection). (Fixing categories turned out to be work.) So it seemed that it would be easier to write my own conversion code. It's nasty code, but I want to share a few bits.

So here's a cut of some ActiveRecord modeling of the basic WordPress 3.x schema -- it's probably pretty close to what would work for 4.x.

require 'mysql2'
require 'active_record'
require 'composite_primary_keys'
require 'pry'

spec = {
  adapter:  :mysql2,
  username: :root,
  database: '7fff_com'
}

ActiveRecord::Base.establish_connection(spec)

class Post < ActiveRecord::Base
  self.table_name = 'wp_posts'
  self.primary_key = 'ID'
  has_many :post_term_taxonomies, :foreign_key => :object_id
  has_many :term_taxonomies, through: :post_term_taxonomies

  def categories
    term_taxonomies.where(taxonomy: 'category').map { |t| t.term.name }
  end
  def tags
    term_taxonomies.where(taxonomy: 'post_tag').map { |t| t.term.name }
  end
end

class Term < ActiveRecord::Base
  self.table_name = 'wp_terms'
  self.primary_key = 'term_id'
  has_many :post_term_taxonomies, :foreign_key => :term_taxonomy_id
  has_many :posts, through: :post_term_taxonomies
  has_one :term_taxonomy, primary_key: :term_id
end

class PostTermTaxonomy < ActiveRecord::Base
  self.table_name = 'wp_term_relationships'
  self.primary_keys = :object_id, :term_taxonomy_id
  belongs_to :post, foreign_key: :object_id
  belongs_to :term_taxonomy, foreign_key: :term_taxonomy_id
end

class TermTaxonomy < ActiveRecord::Base
  self.table_name = 'wp_term_taxonomy'
  self.primary_key = 'term_taxonomy_id'
  belongs_to :term
end

binding.pry

To be sure, I could have been a bit more fancy to get at the tags and categories. I think this is a good example of using ActiveRecord to get at a "legacy" or "foreign" schema. As you can see, I had to set table names, primary key names, and in the case of PostTermTaxonomy, had to leverage the composite key gem since there wasn't a convention primary key on the table.

Here's a run, showing the tricky part of getting the tags and categories:

[1] pry(main)> post = Post.find(764)
=> #<Post ID: 764, post_author: 2, post_date: "2013-02-21 12:00:52", post_date_gmt: "2013-02-21 17:00:52", post_content: "<h3>Making a change in one's own \"my-boxen\"</h3>\r\n...", post_title: "Boxen workflow - notes to self", post_category: 0, post_excerpt: "", post_status: "publish", comment_status: "open", ping_status: "open", post_password: "", post_name: "boxen-workflow-notes-to-self", to_ping: "", pinged: "", post_modified: "2013-02-21 12:00:52", post_modified_gmt: "2013-02-21 17:00:52", post_content_filtered: "", post_parent: 0, guid: "http://7fff.com/?p=764", menu_order: 0, post_type: "post", post_mime_type: "", comment_count: 0>
[2] pry(main)> post.categories
=> ["Technology", "Code"]
[3] pry(main)> post.tags
=> ["boxen"]