Import Facebook Into Wordpress

Introduction

It is possible to get your personal data - posts, images, etc out of Facebook as a zipfile. If you ask Facebook for this as JSON data it’s then possible to write some Python to parse this information and eventually upload it into Wordpress. Here are some collected notes and code snippets on how I did it.

This is not a howto with step by step instructions. You will have to modify this for your own specific needs. However it should provide a good starting point. I’m using Python 3, if you’re following this I’m assuming you know what JSON is and how Python works.

How to export your Facebook Data

This is all based on Facebook in 2021, future updates to Facebook might change or move the location of this page, or the JSON data. It probably won’t be difficult to change it though.

Getting your data out of Facebook

  1. Settings and Privacy
  2. Your Facebook information
  3. Download your information
  4. At the top change the format to “JSON”
  5. Tick to deselect items you don’t want (see below)
  6. Click the blue “Create file” button

It takes a few hours to process this request and you’ll be notified when it is complete.

Select carefully the items to download. You only really want posts as the rest are reactions or extra Facebook specific items. It makes no sense, for example to download your comments and reactions to other people’s content as you can’t download that.

Once the zipfile is ready, unpack it somewhere and have a look around to get a feel for the data and how it is organised.

Facebook’s JSON is badly encoded and contains mojibake

If you look in the data, you’ll see all the Unicode content has been mangled. Specifically all emojis have been messed up. There is a StackOverflow post here that explains it with a bunch of things to try to fix it. There is also this blog post that explains it well too.

The quick way is to use a few command line tools to process the file like this

cat message_1.json | jq . | iconv -f utf8 -t latin1 > m1.json

replacing message_1.json with the file you’re trying to process. Only the JSON files containing your posts needs processing.

I chose to write the loading and mojibake repair into the tool I was writing with this piece of code

def parse_obj(obj):
    if isinstance(obj, str):
        return obj.encode('latin_1').decode('utf-8')

    if isinstance(obj, list):
        return [parse_obj(o) for o in obj]

    if isinstance(obj, dict):
        return {key: parse_obj(item) for key, item in obj.items()}

    return obj
	
with open('text-posts.json', encoding="latin1") as json_file:
    data = parse_obj(json.load(json_file))

The structure of the Facebook JSON

If you look at the data (a task you’ll spend a lot of your time doing - it’s important to understand what Facebook puts in their JSON) you’ll see a few patterns in the data.

Facebook status updates in JSON

This is what I called a regular text post

{
    "timestamp": 1627590606,
    "data": [
      {
        "post": "Reading a book. Cannot tell if the writer is trolling, being very specific, or getting mixed up with terminology, but I just read \u00e2\u0080\u009credundant RAID array\u00e2\u0080\u009d as a thing.\n\nI mean, it could be a thing\u00e2\u0080\u00a6 but I don\u00e2\u0080\u0099t know if it\u00e2\u0080\u0099s supposed to be.\n\nAlso, 96.415 Internet points if you can name the book *and provide zero spoilers because go away I\u00e2\u0080\u0099m still reading it*"
      }
    ],
    "title": "Some User updated his status."
  },

As you can see it is made up from some data items

  • A timestamp in Unix epoch time
  • A data array containing one item
  • A Title entry

Facebook media posts in JSON

Here is what I ended up calling a media post

{
    "timestamp": 1626864708,
    "attachments": [
      {
        "data": [
          {
            "media": {
              "uri": "photos_and_videos/TimelinePhotos_yNnQBEb7og/218389920_10159614527606064_2929622297916153719_n_10159614527596064.jpg",
              "creation_timestamp": 1626864708,
              "media_metadata": {
                "photo_metadata": {
                  "exif_data": [
                    {
                      "taken_timestamp": 1626864708
                    }
                  ]
                }
              },
              "title": "Timeline Photos"
            }
          }
        ]
      }
    ],
    "data": [
      {
        "post": "Yeah maaan... Taste the rainbow."
      }
    ]
  },

These are a bit more complex to process and represent any post in Facebook where images have been attached. There can be multiple images, the EXIF data might be present and numerous depending on your camera.

Other JSON data

You will also encounter seemingly empty posts where there is nothing but the timestamp, or where it seems the main bulk of the data is missing. For example I found several that looked a bit like this

{
    "timestamp": 1627590606,
    "title": "Some User updated his status."
  },

Either the Facebook JSON exporting isn’t that great, or these are supposed to contain data that I didn’t export. Either way, one of the major tasks is to prune out junk you don’t want to import into your Wordpress blog.

Making sense of what to import into Wordpress

There is a trick here that’s important to understand - you need to decide what data is important to you. How it looks is also important.

For example the title data in all these entries is largely irrelevant or pointless - they simply say “Your Name updated their status”. The title seems to be that text which appears in notifications, it’s not an actual title to your messages. So if you wrote some code to bulk import all the data, using title as the post tile in Wordpress, you’ll end up polluting your blog with thousands of posts all titled “Whoever updated their status” “Whoever uploaded a photo” etc.

And this is an important point…

You will have thousands of items to import, if it goes wrong or imports incorrectly you can very easily fill your blog with thousands of junk posts which take hours to remove

I chose to use the first 50 characters of each post for the title, and if uploading images had to use some logic to work out what text to use instead. Think about how you use Facebook - sometimes you just post text. Sometimes that text is highly context sensitive based on current things happening in social media, and read out of context makes no sense. Do you want that importing? Is a one line post that says “Well that was a bad idea!” where the real ‘meat’ of the conversation happened in a thread below it worth capturing? - it’s not possible to extract all the comments on your posts.

There is also some content that makes no sense to import. Facebook likes to provide memories (posts that repeat content from “this time 6 years ago”). Well if you think about it, your Facebook data contains that original post, and that’ll be in your Wordpress blog once imported. Is it worth having a second post 6 years later that just repeats an older post? Remember, you don’t get any of the comments or social interaction that comes with it. Also Wordpress has its own “let’s look at the past” type plugins you can install.

I mean, this is all subjective. You might want to retain these posts if you wrote something meaningful above them. This is what I mean about deciding how the importing should work for you.

Sometimes I uploaded pictures and never wrote anything underneath them, so in those I had to make up a generic title myself.

Privacy notice!

Facebook does not export any privacy data about each post. If you selectively choose who to share content with, this does not get exported. Wrote something a bit private and marked it so your boss couldn’t see it? yeah that’s going straight in your public Wordpress blog to later get indexed by Google.

My importing code uploads all posts as drafts so you have to manually go through and post them for real, forcing a bit of quality control on everything. Use the search tool in Wordpress to look for key words you personally know are in your sensitive posts, and mark those ones private.

Same with photos. Those will get hoovered up by Google Image Search, so make sure you don’t upload any that you don’t want appearing to the public.

Think a bit though - you don’t need to hand process every post. Just go through your “current” information. Had a crappy job two years ago but you don’t work there? Well they probably don’t care if your barely read blog that is hard to find has a few sarcastic posts on it now. Your current employer… it’s worth being more careful.

Snippets of code that are useful for importing Facebook JSON into Wordpress

Again, this is not a howto, you’re expected to use this to help write an importer that works for your blog, fitting your personal preferences of what gets uploaded.

The date

This is just a Unix formatted timestamp. It’s the number of seconds since the Unix epoch. Wordpress wants the date in a more specific format: YYYY-mm-ddTHH:MM:SS where the time is in UTC.

You can do that with this type of Python

from datetime import datetime
ts = int("1284101485")

# if you encounter a "year is out of range" error the timestamp
# may be in milliseconds, try `ts /= 1000` in that case
print(datetime.utcfromtimestamp(ts).strftime('%Y-%m-%dT%H:%M:%S'))

Posting to Wordpress with Python

To post into Wordpress using Python you need to use the Wordpress REST API the first step is to create an App Specific Password. There are several guides online that mention installing a plugin to do this. If you have the latest WP, this now seems to be built in.

Your app specific password in Wordpress looks like groups of characters. The user is your own admin’s username (or whoever generated the password)

import requests
import json
import base64

url = "https://your.blog.url/wp-json/wp/v2/posts"
user = "james"
password = "ABCD abcd ABCD abcd ABCD abcd"
credentials = user + ':' + password
token = base64.b64encode(credentials.encode())
header = {'Authorization': 'Basic ' + token.decode('utf-8')}

post = {
 'title'    : 'Hello World',
 'status'   : 'draft',
 'content'  : 'This is my first post created using rest API',
 'categories': 1, # category ID
 'date'   : '2021-01-05T10:00:00'
}

response = requests.post(url , headers=header, json=post)
print(response.text)

You can upload images that way too, but you’re on your own working out how to link and embed them into the post body itself. The main difficulty with images is

  • They’re stored in folders that contain weird names
  • The URI in the JSON data references this name
  • Wordpress has no concept of galleries
  • Figuring out the URL to your image takes some trial and error

What I did was use an external gallery on my server Piwigo and bulk import all my photo galleries by

  1. Renaming the gallery folders to cut off the text at the end
  2. Using my uploading script to a. Cut the filename off the URI b. Use Piwigo’s own REST API to query the URL of the image c. Insert that URL as an image tag into my post’s body text
  3. Upload the post to Wordpress

Example code

All the code I used, in the state it was once I’d finished is available on my github repository.

This is not a finished, working program. Use it as a starting point!

Comments

Your comment will be moderated, please post thoughtfully.

Related Posts

© Copyright 2021 NCoT Technology

Mobirise free builder - Click for more