Development

Development

Epic Level Data Migrations with South and Django

The workflow for a data migration in Django with South migrations is relatively simple, and fairly well-documented. If you have a model that you want to modify, you'll want to

  1. define your new fields and create a schemamigration;
  2. create a blank migration and access the ORM dictionary to write your data migration, which moves the data from the old fields to the new; and
  3. remove the old fields and create another schemamigration to say goodbye to those unsalted passwords forever.

The workflow is simple enough to understand, but if you want to do anything more complicated than break your names into first_name and last_name, you'll need some more tools. Recently, I ran into a situation where I needed to condense two entire models into a single super-model that contained all fields from both of the originals. To illustrate, I will first give a simple, silly example. If you are neither of these things, feel free to skip to the latter section in which I lay out how to write an epic-level data migration.

Silly Example: Hybridizing Animals

First, lay out the models. Ducks and beavers each get a name, a tail type, and a boolean for their bill (by default, beavers don't have one). For simplicity's sake, put both of these in an "animals" app within models.py

class Duck(models.model):
  name = models.CharField(max_length=32)
  weight = models.DecimalField()
  tail = models.CharField(default="feathered", max_length=32)
  bill = models.BooleanField(default=True)

class Beaver(models.model):
  name = models.CharField(max_length=32) 
  weight = models.DecimalField()
  tail = models.CharField(default="broad and featherless", max_length=32)
  bill = models.BooleanField(default=False)

With that taken care of, run the initial migration

./manage.py schemamigration --initial animals

Then, create some animals in the database. Registering the app in the Django admin makes creating animals easy.

Time to get hybridizing! The three steps are schemamigration, datamigration, schemamigration, so start by creating the hybrid animal class. This goes in animals/models.py with the other two. Give it the same fields as before, but do not specify defaults because these need to come from the inherited classes, and they're all required by default anyway.

class Platypus(models.model):
  name = models.CharField(max_length=65)
  weight = models.DecimalField()
  tail = models.CharField(max_length=32)
  bill = models.BooleanField()

New model added; run the schemamigration:

./manage.py schemamigration animals --auto

To set up the datamigration, begin by creating an empty migration. Don't forget to give it a name:

./manage.py datamigration animals hybridize_ducks_and_beavers

Inside the migration file, write a forwards function:

def forwards(self, orm):
  for duck in orm['animals.duck'].objects.all():
    beaver = orm[‘animals.beaver’].objects.get(id=duck.id)
    form animals.models import Platypus
    platypus = Platypus (
      name = duck.name + “-“ + beaver.name
      weight = (duck.weight + beaver.weight) / 2
      tail = beaver.tail
      bill = duck.bill
    )

A couple of things to note here:

  1. The script loops through every duck in the list of ducks. It matches every duck with a beaver by grabbing the beaver that has the same id as each duck. (It assumes, of course, that there is a matching beaver for each duck.)
  2. Since there are not currently any Platypuses registered, they do not appear in the ORM. Rather than referencing existing models – as done with ducks and beavers – the script needs to import Platypus from the animals models.py file, and create a new instance of the model each time it iterates through the loop.

The new platypuses have hyphenated names. Their weights are an average of their parents, and they get their tails and bills from their beaver and duck parents, respectively:

The genetic experimentation is complete, all that is left is to remove the old models. In animals/models.py, delete all the duck and beaver code, and run

./manage.py schemamigration animals --auto

This will delete the old tables, leaving only platypuses!


Serious Example: Merging Django’s auth.user Model With a Custom User Model

Django’s default user model automatically provides a variety of commonly-used fields, such as username, email, password, is_staff, last_login, and so on. With the release of Django 1.5, it is now relatively simple to write a user model which encapsulates these fields as well as any other custom information that needs to stored about the user. However, prior to this, it was necessary to create a separate, custom table to contain any extra information, and link it via a one-to-one relationship to the auth.user table. This is the situation I was confronted with on a recent project, and when the time came to upgrade the project to Django 1.5, it made sense to combine the two user tables into one larger table to simplify storage and referencing. The procedure helped solidify my understanding of Django user models as well as South migrations, and I hope it helps you as well!

To begin, the auth_user table contained the columns: id, username, first_name, last_name, email, password, is_staff, is_active, is_superuser, last_login, and date_joined. Additionally, the auth_user model had many-to-many relationships with tables called “groups” and “user_permissions”. The custom user model was in an app called members. Thus, the members_user model contained the columns: user_ptr_id (the link to auth_user), user_type, birthdate, bio, email_prefs, hide_onboarding, cancel_state, cancel_reason, and photo. Additionally, the members_user model had three many-to-many fields: each user had favorite_comments, favorite_journal_entries, and favorite_videos.

Ultimately, I wanted all of this data to be encapsulated in a new model called “Profile” in the members app. First, I created the new Profile class in my members/models.py file. It was a duplicate of the existing members_user model, except that it also inherited from django.contrib.auth.models.AbstractUser. This is the mixin used by the regular auth.user model, and granted my Profile model all of the usual user fields (password, username, etc.). Then, I ran

./manage.py schemamigration —auto

to generate the blank model, ready to be populated.

The tricky part is the data migration. In order to coerce the data into a single table, it is necessary to loop through each auth_user; and each time:

  1. create a new profile object,
  2. insert the auth_user data,
  3. create new many-to-many tables from auth_user,
  4. insert the members_user data, and
  5. create new many-to-many tables from members_user.

First, run

./manage.py datamigration members migrate_userdata_to_profiledata

Next, the data migration forwards function: 

class Migration(DataMigration):

  def forwards(self, orm):
    "Write your forwards methods here."
    # Note: Remember to use orm['appname.ModelName']
    # rather than "from appname.models..."

    for authuser in orm['auth.user'].objects.all():

      # Create a new members.Profile for every existing auth.User. I 
      # needed to import Profile in order to create new instances of it.
      from members.models import Profile
      memberprofile = Profile (
        id=authuser.id,
        password=authuser.password,
        last_login=authuser.last_login,
        is_superuser=authuser.is_superuser,
        username=authuser.username,
        first_name=authuser.first_name,
        last_name=authuser.last_name,
        email=authuser.email,
        is_staff=authuser.is_staff,
        is_active=authuser.is_active,
        date_joined=authuser.date_joined
      )

        # Transfer the many-to-many tables from auth_user
        for group in authuser.groups.all():
          memberprofile.groups.add(group.id)
        for permission in authuser.user_permissions.all():
          memberprofile.user_permissions.add(permission.id)

        try:
          # If there is an associated members.User,
          # add those fields to the members.Profile
          memberuser = orm['members.user'].objects.get(user_ptr_id=authuser.id)
          memberprofile.user_type=memberuser.user_type
          memberprofile.birthdate=memberuser.birthdate
          memberprofile.bio=memberuser.bio
          memberprofile.email_prefs=memberuser.email_prefs
          memberprofile.hide_onboarding=memberuser.hide_onboarding
          memberprofile.cancel_state=memberuser.cancel_state
          memberprofile.cancel_reason=memberuser.cancel_reason
          memberprofile.photo=memberuser.photo

          # Transfer the m2m fields from user to profile
          for comment in memberuser.favorite_comments.all():
            memberprofile.favorite_comments.add(comment.id)
          for journalentry in memberuser.favorite_journal_entries.all():
            memberprofile.favorite_journal_entries.add(journalentry.id)
          for video in memberuser.favorite_videos.all():
            memberprofile.favorite_videos.add(video.id)
            
        # In case there is a problem getting the related
        # members_user model, I used pdb to diagnose the issue.
        except orm['members.user'].DoesNotExist:
          pass
        except Exception as e:
          import pdb; pdb.set_trace()

        # All done! Save, and move on to the next user.
        memberprofile.save()

After performing a data migration this big, it’s important to check the actual data for consistency. Indeed, as I wrote this function, I performed the data migration, identified an error, and deleted the table data and migration many times.

The last step was to delete the old members.user model and run

./manage.py schemamigration members --auto

Transition complete;  all user data is in a single table!

Concentric Sky uses Django as one of our core technologies. With Django, we build backends for mobile applications, craft custom web applications and deploy data-driven websites. We've written a number of articles on Django, use the tags to find more.

comments powered by Disqus