Howdy all! A couple of weeks ago I decided to take on a feature for generatedata.com that I’ve been meaning to add for (literally) years: a REST API to allow programmatic generation of datasets, rather than forcing people to use the UI. This has always been something of a white whale for me. Initially I couldn’t see a way to solve it, then when I did, I always had a long list of work that needed to get done first.

But enough is enough! Today I got the core code working and a nice proof of concept in place: the Names Data Type is now working in conjunction with the JSON Export Type to generate random names in JSON via a REST endpoint. Pretty cool.

This should be a staggeringly useful feature and I think it’s conceptually pretty cool, so I’ll take the time to explain it here.

The problem

I designed the Data Generator to be modular so it could be used to generate any sort of data you want: text, numbers, strings, images, silly cat pictures – really anything. Check the developer section of the documentation for an explanation of all that. The problem with this design was all the little pieces of the code were separate entities, had their own configuration settings and generated different things. To add a REST API meant exposing all the functionality offered within the current UI to developers, and that meant making the options within the UI well-defined.

The solution

So here’s what I settled on. The two work-horse module types: Data Types (the type of data being generated) and Export Types (the format in which the data was generated) now define themselves via a JSON schema file (using the json-schema.org site and spec). That lists the name and structure of all generation options for that module. It says what fields are optional and what’s required, and what field types (string, boolean, etc.) each setting is.

For example, let’s look at the Names Data Type. That module is used to generate human names – first names, last names, male, female names, initials, surnames etc. The schema file looks like this:

1
2
3
4
5
6
7
8
9
10
11
{
   "title": "Names",
   "$schema": "http://json-schema.org/draft-04/schema#",
   "type": "object",
   "properties": {
      "placeholder": {
         "type": "string"
      }
   },
   "required": ["placeholder"]
}

Kinda readable! Even coming in fresh. If you look at the generatedata UI and select “Names” for a row, this maps exactly to the options offered in the “Options” column. For this Data Type there’s only one option: a string that contains placeholders, which are switched out for names during data generation.

Now, to use the API, developers need to POST JSON content to a specific endpoint (http://yoursite/generatedata/api/v1/data) in a particular format. Here’s a simple example that generates 100 names of the format “Beth R. Mackenzie”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
  "numRows": 100,
  "rows": [
    {
      "type": "Names",
      "title": "A full name",
      "settings": {
        "placeholder": "Name Initial. Surname"
      }
    }
  ],
  "export": {
    "type": "JSON",
    "settings": {
      "stripWhitespace": false,
      "dataStructureFormat": "simple"
    }
  }
}

Conceptually it’s super simple. The contents of the settings object in the rows array always contains whatever settings are relevant for that Data Type. Similarly with the settings object in the export object below. So for the rows property, you define an array, each index of which is a data type containing whatever settings you want. Here’s an example that generates two rows of Names, in different formats:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
  // ...
  "rows": [
    {
      "type": "Names",
      "title": "A full name",
      "settings": { "placeholder": "Name Initial. Surname" }
    },
    {
      "type": "Names",
      "title": "A female name only",
      "settings": { "placeholder": "Name" }
    }
  ],
  "export": {
    // ...
  }
}

I’ve also added some nice clear error handling that tells you exactly what’s wrong, which should prove invaluable for debugging.

In the coming days I’ll be finishing off the code and working on the documentation. As I said, now I’ve gotten the proof of concept going it should be straightforward to complete for here on out. Keep you posted. :)