Documentation
MvManila 2.7 (Move Manila)
MvManila is a set of scripts to convert a Manila blog to WordPress and some other blog engines using the XML-RPC interface of the Manlia blog. The scripts download the Manila blog’s posts and pictures to the machine running the scripts and after further processing you copy the resulting file to your Wordpress installation, upload all the pictures and import the text into the new blog.
MvManila has a simple “not for commercial use” License. Use it for moving your blog, Fix it if you need to, If you want something else contact me after reading the LICENSE.txt file
Requirements: The scripts run on Mac OS X or Linux. The download scripts are written in Ruby and the other processing is done with Tcl. No cutting edge features are used from Ruby or Tcl. so any version you have will probably work.
These scripts are for converting to Wordpress 2.0+ or anything that can handle Movable Type input (some restrictions apply See the end of this file about TextPattern and Movable Type).
I run a “for fee service” to convert sites and this is the public release of the tool set I used on the last conversion. It is not a trivial exercise dealing with what Manila gives you and how you configure these scripts. You may have to delete and reinstall your Wordpress blog (several times) as you tweak the MvManlia configuration file, so make a backup if your Wordpress blog has anything you want to save. A test blog is a handly thing to have if you can.
General Flow:
- Verify you have the needed software (Ruby, Tcl, iconv and admin info for the Manila site and the WP site.
- Download and untar MvManila-2.6.tar.gz
- Edit the config file and test.
- Download the pictures with getpicts.rb or getseqpics.rb
- Download the shortcuts with getshort.rb.
- Download the site structure with getlist.rb
- Download the blog text entries with getposts.rb
- Download the gems and such with findstatic.rb
- Process the downloaded blog text file with iconv, fixXml.tcl and fixurl.tcl and fixstatic.rb
- Upload the pictures to your WordPress location
- Upload the cleaned up text file from fixurl
- Upload the ccdl.php importer scrip for WP 2.0.4+
- Import the text into WordPress.
If you don’t like what you see in Wordpress then you’ll have to
go fix the script with the problem, Delete the wordpress blog database, create a new db (same name). Install Wordpress. Start over with script that needs fixing and reimport.
Since OS X and Linux usually have Apache, smart people would create a test Wordpress blog running locally on their desktop box, even if you don’t have a host name. You’ll have to run part of the conversion twice and futz with the config.txt file if you do this (a few minutes extra work).
All the file URLs used here are examples. So file names are hardcoded – the “pidx”, “posts.map”, ‘structure’ and “shortcuts”. All though you can create your own names for the files on stdout, be consistent and consider using the naming convention I’ve used below.
1. Verify Your Environment:
Verify you have a Ruby 1.8.2 or later.
[ccoupe@gdub ccoupe]$ ruby -v
ruby 1.8.2 (2004-12-25) [universal-darwin8.0]
Verify you have a Tcl version that is 8.x or later.
[ccoupe@gdub MvManila-2.5]$ tclsh
% info tclversion
8.4
% ^D
Test if you have ‘iconv’
ccoupe@gdub MvManila-2.5]$ iconv -V
iconv (GNU libc) 2.3.3
Login to the manila site with the admin username and password using your browser. Write down the username and password and the Manila blog’s URL. Write down the ssh, FTP login, and control panel login to your new installation. Write down the URL for the new Wordpress blog, the WP admin name, password
Read the rest of this documentation several times.
Read the README.txt and TODO.txt for the latest info (if they exist)
1. Config.txt Nitty Gritty
Edit the config file. You need to fill in the Manila blog’s URl, the managing editors account name and password.That’s for downloading purposes. The scripts do not write to your Manila blog. Copy the config.example to config.txt and modify. The example might look like this
ManilaSite: http://werehosed.editthispage.com
ManilaAdmin: ccoupe@micron.net
ManilaPass: secret
Standard login stuff for manila. You know what it needs.
You have to specify a directory on the machine running these scripts to store the pictures in. The example uses LocalPictDir: picturesbut you could use a full path or relative path on the machine running the scripts. The example assumes that “pictures” is a subdirectory of the directory with these scripts. That’s probably what you want. So leave it alone. Your choice of LocalPictDir won’t change where the the pictures are stored on the new blog or how the their references get re-written.
The next two lines work together in directing the conversion of the urls of your downloaded manila posts. NewSite: is the URL of your new wordpress blog. Something like
NewSite: http://myblog.example.com/
Do not forget that trailing slash.
You need a directory in your WordPress site to hold all the pictures you are going to later upload (ftp) for your wordpress blog. You probably have one, but you could make another if you want a different directory for the converted site’s pictures. The example uses
PictSubDir: http://myblog.example.com/archives/images
Withoutthe trailing slash.
It seems obvious for the simple case above, but you can also accomodate a number of potential configurations because both NewSite: and PictSubDir: can be changed later or as many times as you need.
For example, I have a copy of wordpress installed just for my testing at http://192.168.1.2/test/ Thats a subdirectory of a private IP address.
NewSite: http://192.168.1.2/test/
PictSubDir: http://192.168.1.2/test/pictures
When I’m happy the conversion looks OK, then I change NewSite and PictSubDir to the real hosting service setup.
Once the stuff (text, pictures, shortcuts and so on ) is downloaded off of the manila site, you can modify and generate a new upload file with new URLS in the text in just a fews seconds and play with all the variations possible from config.txt.
Message Numbers
Manila gives everything a message Number. It might be a “story” post, it might be a picture, It might be a comment, but it has the $nn whether its a pictureViewer or a msgReader or newsItem or something I haven’t seen yet. Wordpress has message numbers too but not for pictures or comments. They have to be mapped. so that they all link to the right place. That’s my magic.
The first manila post is $1, usually the About Me page. Then a few days later you write a post ( say its $6): “I’ve updated my About ($1) with a picture“. Been there, we all did, but pay attention now. A brand new WordPress install also has a #1, a “hello world thing”, You can delete it but the next post is going to be number 2. You can delete that too and the next post will be #3.
You could have years of newer posts in the WordPress blog and now you want to import this old manila stuff, you can do that if you know what the last number used my Wordpress is.
WordPress won’t let you import on top of existing messages. Manila $1 is not going to replace WP #1. What you can do (and I do) is set the WP message number of the imports well past your last post. Say you have 213 WP posts already and you want to import the Manila stuff. If you tell WP to start at 300 for it’s import. there’s no collision. Simple. Except the normal WordPress import scripts won’t do that. There is a patch floating out there, and it’s included in MvManila 2.5 as “ccdl.php”
This works because most of Wordpress organizes things by date/title and not by message number.
The scripts do not require that Wordpress have any particular permalink structure and its been tested using the default and the Y/M/D formats If your wordpress installation supports it.
Running the scripts
1. Test the config.txt for downloading ruby test.rb 40 will display what Manila knows about post 40. There should not be any errors on the terminal. If Manila times out, the error message will be obvious. If the login info is wrong, that will be obvious and if message 40 happens to be deleted It will say so (try again with a different number).
2. Download the pictures. There’s two ways, the easy way that might work, call that option (a) and the slower way that will always work (b)
(a) Download the pictures with ruby getpicts.rb >pidx This will take a awhile, Repeat if manila times out. The terminal (stderr) will show you the name of the pictures being downloaded. Stdout needs to be redirected exactly as shown.
(b) If Manila won’t give up the pictureList and some servers won’t or it takes them so long to do it that you don’t want to wait. You can download the pictures a group at a time.
ruby getseqpic.rb 1 200 >p1.txt
ruby getseqpic.rb 201 400 >p2.txt
Downloads the pictures in messages 1 thru 200 and store the picture info in p1. Then all the pictures in the range 201 thru 400 to p2. Eventually you get to the end, and cat p?.txt p??.txt >pidx.
Look in the directory you specified as LocalPictDir in the config file, open a couple of fhem up to make sure you have them. Check file sizes, etc.
3. Download the shortcuts with ruby getshort.rb >shortcuts Look at it with a text editor if so inclined.
4. Download the site structure and story list. ruby getlists.rb That creates a file called “structure”.
5. Double check that the pidx download file is named “pidx” and the shortcut download is named “shortcuts”. These names are hardcoded in other scripts so you have to use those names.
6. Time to download the posts and comments from Manila. The chances of downloading all of the posts at once without a Manila time out is slim so do it in batches.
ruby getposts.rb 1 200 >t1.txt
ruby getposts.rb 201 400 >t2.txt
ruby getposts.rb 401 433 >t3.txt
cat t1.txt t2.txt t3.txt >download.txt
You can find the last number (433 in the example) by using Manilas Discussion menu. If you pick a final number that is higher than the last, the script is going to keep trying. reporting them as deleted or missing. Save yourself some time and the net some bandwidth by learning what the last number should be.
[note, it would be possible to download the posts and picts together and it would save a lot time. I've did that for the 3.0 scripts but I can test it for 2.7 because I don't have an account on a manila machine to test with]
The time consuming part is over. This can take a day or two, depending on how big the Manila blog is, how fast the server is, your download speed and having to restart because of a typo in the command line.
Everything is now on your computer. Don’t cancel the Manila account yet or switch your DNS yet. Give yourself some room to make errors and do over.
6. Sadly, the character set enocoding needs to be converted, Best I can tell, Manila uses CP1252 (Win Latin), ISO-8859-1 or UTF-8. CP1252 is going to bork some things because Wordpress wants import files in UTF-8. If you have the ‘iconv’ program, it seems to convert things properly if you use the correct parameters. See ‘man iconv’. One of the following has worked for me. Which one you need is your best guess.
iconv -f CP1252 -t UTF-8 <download.txt >foo.txt
iconv -f ISO-8859-1 -t UTF-8 <download.txt >foo.txt
cp download.txt foo.txt
7. The next command has a very important argument and that is the number you want the imported posts to start at. If you have a brand new default wordpress blog and you’ve never made a post on it, “5″ is a good choice. If your wordpress blog has posts you don’t want to lose then pick a number past the last one you have already. Maybe you have 379 posts already in Wordpress and you intend to keep on posting there no matter how long the Manila conversion takes. Pick a big increment then for the argument to the next command. Maybe, 420 or 500. That gives you some room to expound.
The command is ./fixXML.tcl -i 5 <foo.txt >f1.txt or ./fixXML.tcl 420 <foo.txt >f1.txt Ignoring some grungy details, this creates a cross reference file (posts.map) so if manila post 35 is pointing to previous manila post 3 or or newer post 72 they will be converted to point to newblog #8 (+5) or newblog #79 (+5)
8. Nearing the home stretch. All we need to do now, is run ./fixurl.tcl <f1.txt >f2.txt It looks for those manila shortcuts and picture names and expands the macro to be a reference to your Manlia site . “ManilaSite:” reference and tries to replace it with something pointing to NewSite: and PictSubDir: and it you were pointing to a blog post of your own and not a picture it looks up the offset in “posts.map”
It does work, Really! On stderr you’ll get some progress messages and possibly some diagnostic messages. The most dreaded message is “Not a Pict Or Post”. There a message number to help you track it down. This typically is a top level post pointing to one of your comments. Write down those numbers.
There’s one more thing to do: Grab the gems and double check that all the posts to your self have been cleaned up. In version 2.6 or earlier, there wasn’t this capability and it’s primative in 2.7. you have to edit a regular expression in in findstatic.rb and fixstatic.rb and checkold.rb.
The sad fact is, sometimes you used a shortcut to reference the gem or file. Sometimes you hardcoded a link to the static server, gem or picture. (Been there. done that, seen it all to often). Sometimes the static server changed host names so those shortcuts and handcrafted links aren’t so easy find.
So edit those regular expressions.
findstatic.rb should run after fixurl.tcl using it’s output, f2.txt. so ruby findstatic <f2.txt and you’ll get a list of curl commands you can cut and paste into a terminal to grab the gems or other missing files.
fixstatic.rb uses the same regular expression and input file (f2.txt) but it changes the reference and produces a new fix (and checks that those missing files are now in the LocalPictDir. ruby fixstatic.rb <f2.txt >f3.txt
checkold.rb can be run with any input file. It spits out errors messages about anything pointing back to your old Manila blog. (after you edit the regular expressions). At this point in the conversion, these are errors you’ll have to fix in Manila, or in WordPress or just ignore. There’s some eye openers in there. Maybe you to linked to the Manila stats page or the Manila mailto form (seen them both) and maybe something didn’t get converted properly by the scripts – I’ve seen a lot of that too.
ruby checkold.rb <f3.txt (or download.txt any of the others. It only reports.
Time To Do The Import. If you’ve got a test set up (and you should) “new” means your test blog. f3.txt is your MT format file. If needed you can run
ruby split.rb <f3 which will create import1.txt, import2.txt and so on in 1.5MB chunks.
- Upload your pictures to your new webserver in the directory that matches the config.txt. Use your ftp client
- If you haven’t done so, do the quick install of Wordpress on your new host. Login as ‘admin’ and change the admin password
- Upload the ccdl.php script into wp-admin/import/ on the webserver
- With your browser, Login as the ‘admin’ and select Manage–>Import and pick the Manila (ccdl) importer. Upload the file (f3.txt, import1.txt, import2.txt). Wait a few bits (this is the annoying part). If you’re lucky, you’ll get a list of all the “authors”. Unless you really have multiple post authors/admins and you know a whole lot about wordpress, you want to set them all to import as “admin”. Trust me. If some of those “authors” have long and unlikely names with borked html in them. Clean them up now to be proper wordpress user names. Click the import button and you’ll get a screen of message titles that are imported (or errors). The first number is useless serial number. Then the title, the post’s real message number in Wordpress, and the number of comments
If it hangs before you get the Authors page:
1. You didn’t wait long enough. Your upload speed is nothing like your download speed and you’re uploading a meg or two or more. It takes a minute or two or three of 5 or 10
2. It’s possible your server is a bit tardy on the uploads or just cranky.
3. Your server has upload limits in file size or time
4. You don’t have the permissions set properly on the Wordpress upload directory. 775 on wp-content and chown/chgrp to match what Apache is running as.I’ve seen them all. One you get that authors page, it’s likely to go just fine.
After you import and test out the new blog, once or twice, and are happy with the conversion You still have to fix some things by hand. Sorry.
- Go set your permalinks like you want. Do it know before the search engines get ahold of you.
- Set the admin prefs so the “displayed as” matches what you want.
- Go have fun with themes but remember not all layout problems are a bad MvManila conversion. Seen a lot of that, not one was bad conversion. Bad source material, seen that. Theme and expectations. Whole lot of those.
- Remember all those error messages about things you have to fix later? Oddly enough, they have to fixed by you
It sounds terrible doesn’t it? It’s my little business to do most of the nasties for you. It’s not that bad.
Problem You’ll See (the shortlist)
1. Early Manila posts may have extra blank lines. I don’t have a good solution for that problem.
2. There are a fews things I don’t convert. You can fix them before you import, after you import or just ignore it. I prefer using a text editor on the imports.txt file and looking for all of the following”
Seach for “ERROR” Possibly Shortcuts that aren’t img tags to your own pictures or that don’t point to your own Manila blog entries. Note the date and post title. You have to clean these up by editing the post in Wordpress.
Search for your Manila site name, like “werehosed.editthispage.com” for example. That’s a URL that didn’t get converted. Why is unknown. Probably a post referencing a comment but it could be something else. Maybe a Manila gem. Write down what you’ll need to find it In Wordpress (message number, date, title).
3. Manila NewsItems may not be converted like you think they should. (the URL is lost). I’d like to fix that somehow (I may have with 2.7 but see the TODO file.
4. Manila servers also have a tendency to time out which will give you something like a “peer disconnected” message in the stack trace. That one you could try again. Of course, if you entered the wrong username or password in the config file you’ll get some sort of “unauthorized” message in the stack trace.
5. Comments are different between Wordpress and Manila. They get their own $nn number at manila. They may or not be linked to the original post. If they aren’t, I treat them as a new message. Most are but you’ll get some excess postings which are really comments. I recursively crawl the comment tree and try to get them in proper order for a wordpress import.
If you started a brand new Manila post with “Re: ” in the title, it’s not going to survive the scripts and get converted. Go change the manila site post title to something else and start over from thegetposts.rb step .I have to ignore “Re: ” titles and hope they get picked up in the comment tree crawl.
6. Pings and Trackbacks might show up as comments or simply vanish. I’d like to fix that in the future. You also have to decide if they really mean anything to you. Link rot is a fact of life. There’s no way to know if the trackback site still exists update their post still points to your converted post
7. Have I mentioned the value of a test blog? A lot easier to set up than constant ftping files around for testing and blowing off a remote database.
8. The ‘ccdl.php’ import script is a slightly modfied version of Joshua Zaders mt.php importer that appears to do what I wanted. My modifications have been to change the name and text to clearly show that its for a Manilla conversion.
About Those Message Numbers
As written above, the scripts depend on predicting or forceing the new blog to accept import files with explicit message numbers. That makes them dependent of the ccdl.php importer which can specify the incoming message number. That sucks for a lot “business” reasons like I have to maintain and distribute that ccdl.php script. That could with and the next WP release and that potential breakage would be repeated for every blog engine.
Version 2.6+ has provisions to use the out-of-box importers for Wordpress and TextPattern and maybe Moveable type. IF YOU CAN PREDICT what the next message number WordPress or Text Pattern will create. You can find that out but you can’t write or delete posts on the new blog until the import is completed and that might take days.
Wordpress 2.0.4 when freshly installed has created messages 1,2 and 3 and an import will start at 4. Obviously this will change over WP releases and you’re own blogging behavior on the new site Maybe you made some new posts to see what happens, Your next message number would be different. If you’re feeling lucky and willing to stop blogging on the new site until the import is done, MvManila’s fixXml.tcl script now requires one of two switches. fixXml builds a map of old message numbers to new message numbers. You can specify the first number you want and all imports will increment by one from there so a brand new WP blog could use ./fixXml.tcl -s 4 Start the imports will at 4 and increment by one. Guessing wrong is going to hurt and you may not discover the error until much later .
You can also use the ./fixXml.tcl -i 100 (or 5 or 400) but then you need special importing script like ccdl.php that can insert their own message numbers. You can force message numbers or you can predict message number. Lots of ways to go wrong.
I used the ./fixXml -s 4 with the standard out of the box MT importer on a new WP blog. Worked fine but I knew what to look for when looking for errors.
Notes About moving to TextPattern
I have imported the same Manila site into WordPress and TextPattern. TextPattern has some bugs in the importer script with Categories and as written,(v 4.0.04) you can’t “force” message numbers on import. You can predict the next message number. On freshly installed TXP,
./fixXml.tcl -s 2
you also have to add a switch for the fixurl.tcl command. The default is produce new URL’s that use the WP …../?p=nn.
./fixurl.tcl -dp id
produces URL’s like …./?id=nn
Tested with TXP 4.0.4. Works, sort of. There’s a bug of theirs with categories and I don’t want to work that hard fixing their problem.
Might work for other blog engines
Notes About Movable Type Format
It may not be obvious. The last import file is in MT Import format and most of the intermediates. The only format I know that can export and import comments. I’d love to be wrong, but if you care about details like links that work and pictures and comments, MvManila is all you have and MT format is pretty much what I have to work with.
v2,6 of MvManila produces “more” correct MT file import syntax. Used to be Wordpress couldn’t handle correct. It can now, so I fixed MvManila to do better too. TextPattern needed some of those fixes but it’s not 100% with good syntax either.
If you wanted to, that mt format import file could probably get you an MT blog. I’d put in the patches out there to force the message number rather than guess. It might work. Let me know you try it on MT. if it’s a static MT site, you probably need to fix a few things in if fixurl.tcl.
One Last Thing
I’m really not trying to be being vague or scare you. The code is there for you to study and play with and all the files are text files (except the pictures of course) so you can fix them or learn from them.
As of April 2007, version 3.0 is what I use. I think 3.0 is pretty damn nifty but it’s a lot more complicated, too clever for a simple doc file like this. Odds are high 2.7 will do right for you.
I’ll convert your Manila site for a very modest fee. If you decide to it yourself with these scripts, Please let me know how it went and how to improve the scripts or the documentation. It will make it easier for the next blog whether you do it or I do it. You can always ask me questions . ccoupe@mvmanila.com