Migrate from blogger to jekyll
Published 2010-8-21Updated
See the new article: Migrate from Blogger to Jekyll with Proper Redirects
Goal
Migrate from blogspot to Jekyll.
todo
- indexing related posts with --lsi
- parse formatted content (not likely to be accomplished)
Pre-reqs
Turn full rss feeds on for your blog
- Navigate to your blog
http://yourblog.blogspot.com
Sign in
- Click on the
edit
link for any post - Click
settings
- Click
site feed
- Click
advanced mode
- Select
full
for all options - Save
Import
The caveats are that you either lose a lot of formatting or a lot of your time. You pick.
with vilcans
' Jekyll rss_importer
BLOGGER=coolaj86
git clone http://github.com/vilcans/jekyll.git
cd jekyll/
git branch -a
git checkout origin/rss_importer
git checkout -b rss_importer
git branch
mkdir -p _posts
sed -i "s/require \"YAML\"/require \"yaml\"/" ./lib/jekyll/converters/rss.rb
wget http://${BLOGGER}.blogspot.com/feeds/posts/default?alt=rss -O ${BLOGGER}.rss.xml
ruby -r './lib/jekyll/converters/rss' -e 'Jekyll::RSS.process("'${BLOGGER}'.rss.xml")'
Use the by-hand approach
BLOGGER=coolaj86
wget --convert-links --html-extension --mirror --random-wait --wait 3 http://${BLOGGER}.blogspot.com/
Essentially you would want to parse
YYYY_MM_DD_title.html
as the filename andname:
<title>
up to</title>
and put it intitle:
<div class='post-body entry-content'>
until<div class='post-footer'>
and put it after the YAML front-matter.<h2 class='date-header'>
up to</h2>
and put it in asdate:
(includes time, which the filename doesn't)
If you write a script to strip out all of the garbage and keep the post + formatting, I'd love to hear about it.
Here's a post that will get you halfway to converting html to markdown.
Categorize by Blog
I'm using Fastr
as a template for my blog. Fastr
supports categories with vanilla Jekyll
.
Here's a script I used to go through one of my blogs, which was created back when there was no title
field:
BLOG=coolaj86
ID=0 # Fastr doesn't allow posts of the same name
cd ${BLOG}_posts
ls | while read POST; do
sed -i "s/^title:/title: untitled ${ID}\ncategories: ${BLOG} uncategorized/" ${POST}
mv ${POST} `basename ${POST} .html`_${ID}.html
let ID=ID+1
done
And the other, which thankfully did have titles:
BLOG=thesystemisntdown
cd ${BLOG}_posts
ls | while read POST; do
sed -i "s/^\(title:.*\)/\1\ncategories: ${BLOG} uncategorized/" ${POST}
let ID=ID+1
done
And then to give them the Fastr layout
ls | while read P
do
sed -i "s/layout: post/layout: article/" ${P}
done
Possible Errors
If you didn't enable full rss feeds (and click save):
No content in RSS item '2006_03_01_archive'
Created 0 posts!
If you didn't replace "YAML" with "yaml":
/home/user/jekyll/lib/jekyll/converters/rss.rb:5:in `require': no such file to load -- YAML (LoadError)
from /home/user/jekyll/lib/jekyll/converters/rss.rb:5:in `<module:Jekyll>'
from /home/user/jekyll/lib/jekyll/converters/rss.rb:1:in `<top (required)>'
from ruby:0:in `require'
If you don't have a _posts
:
http://coolaj86.blogspot.com/2010_05_01_archive.html#8976446356395410673 -> _posts/2010-05-06-2010_05_01_archive.html
/home/user/jekyll/lib/jekyll/converters/rss.rb:39:in `initialize': No such file or directory - _posts/2010-05-06-2010_05_01_archive.html (Errno::ENOENT)
from /home/user/jekyll/lib/jekyll/converters/rss.rb:39:in `open'
from /home/user/jekyll/lib/jekyll/converters/rss.rb:39:in `block in process'
from /usr/local/lib/ruby/1.9.1/rexml/element.rb:906:in `block in each'
from /usr/local/lib/ruby/1.9.1/rexml/xpath.rb:64:in `each'
from /usr/local/lib/ruby/1.9.1/rexml/xpath.rb:64:in `each'
from /usr/local/lib/ruby/1.9.1/rexml/element.rb:906:in `each'
from /home/user/jekyll/lib/jekyll/converters/rss.rb:16:in `process'
from -e:1:in `<main>'
By AJ ONeal
Did I make your day?
Buy me a coffee
(you can learn about the bigger picture I'm working towards on my patreon page )