Notes on Loading Data to Google App Engine
Google has a fantastic documentation on this topic but at the time I wrote this blog entry, the documentation covered how to download and upload data using appcfg.py but not with bulkloader.py (there is also bulkload_client.py). So, I decided to play around with the nifty bulkloader and keep a note on my findings.
Prepare the End Point for Loading Data
Loading data to the Data Store is accomplished by sending data to the application over HTTP. This means that your application needs a uniquely identifiable URI for you to send your data to. Creating a valid URI is just a matter of setting up a handler for it in the app.yaml config file. GAE takes care of the import logic with it’s own handler. There’s nothing special in this step and the documentation covers how to do this concisely.
Test Data for Demo Purpose
For this blog entry, I decided to prepare a CSV with four rows that represents users. In reality, there would be more information related to a user but I decided to keep things minimal for this blog entry. I saved this data as user.csv.
1, Daniel, Bernstein, xxxxxxx 2, Donald, Knuth, xxxxxxx 3, Bjarne, Stroustrup, xxxxxxx 4, Robert, Sedgewick, xxxxxxx
You can also represent your table in XML but I decided to use CSV for it’s simplicity.
Create a Bulk Loader Configuration File or Not
In addition to the CSV file, the bulk loader needs to know how each record in the CSV file should be represented as a Data Store entity. The modeling as far as I know can be done in two ways. One is to write a loader class in Python that the bulkloader can use. Another approach is to get bulkloader.py to generate a configuration file (in YAML).
I decided to write my own Python class to get through this step since according to the documentation at the time this blog post was written, this approach doesn’t work with the local development server.
With the above in mind, here is my loader class. You would usually keep the Data Model definition (the User class) in a separate file but for demo purposes, I decided to keep it in one file.
from google.appengine.ext import db from google.appengine.tools import bulkloader class User(db.Model): id = db.IntegerProperty() firstname = db.StringProperty() lastname = db.StringProperty() some_text = db.StringProperty() class UserLoader(bulkloader.Loader): def __init__(self): bulkloader.Loader.__init__(self, 'User', [('id', int), ('firstname', str), ('lastname', str), ('some_text', str)]) loaders = [UserLoader]
The explanation on what this class does is described in the documentation. I saved this script as user_loader.py.
Load your Data to the Data Store
For demo purposes, I used my local development server on port 8083 to load the CSV file. Given that the application is running and that the API endpoint is active, it’s just a matter of providing bulkloader.py with essential information. For available options I recommend reading help by executing ‘bulkloader.py -h’.
The following command attempts to load our entity of ‘kind=User’ from user.csv using our loader class (user_loader.py) to the endpoint.
$ bulkloader.py --filename=user.csv --config_file=user_loader.py \ --kind=User --url=http://localhost:8083/import --app_id=your_app_id
Note that it’s essential to provide the --app_id option when uploading data to the local server. When asked for credentials, you can type anything you like. You only need to supply valid credentials when uploading to production.
Here’s the output from executing the above command.
[INFO ] Logging to bulkloader-log-20100615.213842 [INFO ] Throttling transfers: [INFO ] Bandwidth: 250000 bytes/second [INFO ] HTTP connections: 8/second [INFO ] Entities inserted/fetched/modified: 20/second [INFO ] Batch Size: 10 [INFO ] Opening database: bulkloader-progress-20100615.213842.sql3 Please enter login credentials for localhost Email: foo Password for foo: [INFO ] Connecting to localhost:8083/import [INFO ] Starting import; maximum 10 entities per post [INFO ] 4 entites total, 0 previously transferred [INFO ] 4 entities (933 bytes) transferred in 4.0 seconds [INFO ] All entities successfully transferred
Success!
