generatedata 3.2.0

Posted on Jan 29, 2015 in generatedata.com, Open Source Projects | 1 comment

I just released generatedata.com 3.2.0, which includes a long-awaited feature: a REST API to allow programmatic generation of data sets. Yay! Great to see this sucker finally out the door.

Note: this new feature won’t be added to the public site – it’s intended for your own installations of the script only. The main generatedata.com offers a service where you can donate and get an account on the site. This is really just intended as a quick convenience and a way for people to contribute to the script. The downloadable version has all the functionality – and now more.

You can find the latest 3.2.0 tag on the github repo, and the new API functionality documented here. Enjoy! :)

Read More

generatedata API

Posted on Jan 18, 2015 in generatedata.com, Open Source Projects | 4 comments

Howdy all! A couple of weeks ago I decided to take on a feature for generatedata.com that I’ve been meaning to add for (literally) years: a REST API to allow programmatic generation of datasets, rather than forcing people to use the UI. This has always been something of a white whale for me. Initially I couldn’t see a way to solve it, then when I did, I always had a long list of work that needed to get done first.

But enough is enough! Today I got the core code working and a nice proof of concept in place: the Names Data Type is now working in conjunction with the JSON Export Type to generate random names in JSON via a REST endpoint. Pretty cool.

This should be a staggeringly useful feature and I think it’s conceptually pretty cool, so I’ll take the time to explain it here.

The problem

I designed the Data Generator to be modular so it could be used to generate any sort of data you want: text, numbers, strings, images, silly cat pictures – really anything. Check the developer section of the documentation for an explanation of all that. The problem with this design was all the little pieces of the code were separate entities, had their own configuration settings and generated different things. To add a REST API meant exposing all the functionality offered within the current UI to developers, and that meant making the options within the UI well-defined.

The solution

So here’s what I settled on. The two work-horse module types: Data Types (the type of data being generated) and Export Types (the format in which the data was generated) now define themselves via a JSON schema file (using the json-schema.org site and spec). That lists the name and structure of all generation options for that module. It says what fields are optional and what’s required, and what field types (string, boolean, etc.) each setting is.

For example, let’s look at the Names Data Type. That module is used to generate human names – first names, last names, male, female names, initials, surnames etc. The schema file looks like this:

1
2
3
4
5
6
7
8
9
10
11
{
   "title": "Names",
   "$schema": "http://json-schema.org/draft-04/schema#",
   "type": "object",
   "properties": {
      "placeholder": {
         "type": "string"
      }
   },
   "required": ["placeholder"]
}

Kinda readable! Even coming in fresh. If you look at the generatedata UI and select “Names” for a row, this maps exactly to the options offered in the “Options” column. For this Data Type there’s only one option: a string that contains placeholders, which are switched out for names during data generation.

Now, to use the API, developers need to POST JSON content to a specific endpoint (http://yoursite/generatedata/api/v1/data) in a particular format. Here’s a simple example that generates 100 names of the format “Beth R. Mackenzie”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
  "numRows": 100,
  "rows": [
    {
      "type": "Names",
      "title": "A full name",
      "settings": {
        "placeholder": "Name Initial. Surname"
      }
    }
  ],
  "export": {
    "type": "JSON",
    "settings": {
      "stripWhitespace": false,
      "dataStructureFormat": "simple"
    }
  }
}

Conceptually it’s super simple. The contents of the settings object in the rows array always contains whatever settings are relevant for that Data Type. Similarly with the settings object in the export object below. So for the rows property, you define an array, each index of which is a data type containing whatever settings you want. Here’s an example that generates two rows of Names, in different formats:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
  // ...
  "rows": [
    {
      "type": "Names",
      "title": "A full name",
      "settings": { "placeholder": "Name Initial. Surname" }
    },
    {
      "type": "Names",
      "title": "A female name only",
      "settings": { "placeholder": "Name" }
    }
  ],
  "export": {
    // ...
  }
}

I’ve also added some nice clear error handling that tells you exactly what’s wrong, which should prove invaluable for debugging.

In the coming days I’ll be finishing off the code and working on the documentation. As I said, now I’ve gotten the proof of concept going it should be straightforward to complete for here on out. Keep you posted. :)

Read More

2014 birding sum up

Posted on Jan 4, 2015 in Birding | 0 comments

Last year I ended up with a total of 262 BC birds, ranking me 15th in the province on the eBird list, and joint 80th in Canada as a whole. Not a bad year! This year I hope to get up the Peace River region to catch the array of birds you can only see up there and hopefully go on a few more pelagics (boat trips for birds). I had to miss out on 3 months of birding due to my wife getting sick (she’s a lot better now), so my numbers aren’t quite what I hoped. But all in all, a good year.

Bring on 2015.

Read More