generatedata.com pending update

Posted on Dec 6, 2015 in generatedata.com, Open Source Projects | 0 comments

Just a heads up, I’ll be updating the public generatedata.com site soon. Hopefully tonight, but no promises: I don’t want to rush it. The live site has been running an old version of the script for some time now (3.1.4 from Sept of last year). There have been a lot of improvements since then, so this’ll mean a pretty big update.

[EDIT: Gah, abort! Bug found. I’m going to need to release a new generate data build first before returning to the main website].

Read More

generatedata 3.2.2

Posted on Nov 15, 2015 in generatedata.com, Open Source Projects | 0 comments

Yesterday I released a new version of generatedata. It’s been in development off and on for several months, so it’s nice to see it launched.

Today I’m going to wade through the list of issues that have piled up and fix the most significant ones, so expect a bug fix release to come out pretty fast.

Then… definitely time to update the website! That’s gotten pretty out of date.

Read More

generatedata.com site bug

Posted on May 26, 2015 in generatedata.com, Open Source Projects | 4 comments

So this is strange. The last 24 hours or so I’ve noticed that the generatedata.com website sometimes fails to load. The JS errors don’t make terribly much sense, it’s like the JS isn’t fully loading sequentially. I haven’t changed the code in several months.

Seems like it’s local to Chrome only. Firefox seems fine. I’m going to monitor it – possibly it’s a Chrome issue that was introduce in 44.x.

Read More

generatedata 3.2.1

Posted on May 25, 2015 in generatedata.com, Open Source Projects | 2 comments

I just released a new version of generatedata. You can download the free standalone script from the github repo here. This new version fixes a few smaller issues that have been reported, but most significantly it includes a great new feature to back up your data set configurations. Now, any time you save a data set it automatically saves a new copy of the configuration. Loading it will always default to the latest copy, but in case you ever need it, you can just click in the History link for that data set in the main dialog (found by clicking the Data Sets icon) and you can browse the history.

Every now and then I’ve heard of people run into problems when saving a data set. And if you’re dealing with really large data sets, clicking “save” and finding you’ve lost all your data is a pretty darn serious bug. But not being able to reproduce it, I decided that this would be a nice interim fix – as well as provide some good additional functionality.

I won’t be updating the public website for a little while longer yet. I’d like this feature to be out in the wild for a few months to confirm everything works as expected.

Enjoy! :D

Read More

generatedata 3.2.0

Posted on Jan 29, 2015 in generatedata.com, Open Source Projects | 6 comments

I just released generatedata.com 3.2.0, which includes a long-awaited feature: a REST API to allow programmatic generation of data sets. Yay! Great to see this sucker finally out the door.

Note: this new feature won’t be added to the public site – it’s intended for your own installations of the script only. The main generatedata.com offers a service where you can donate and get an account on the site. This is really just intended as a quick convenience and a way for people to contribute to the script. The downloadable version has all the functionality – and now more.

You can find the latest 3.2.0 tag on the github repo, and the new API functionality documented here. Enjoy! :)

Read More

generatedata API

Posted on Jan 18, 2015 in generatedata.com, Open Source Projects | 4 comments

Howdy all! A couple of weeks ago I decided to take on a feature for generatedata.com that I’ve been meaning to add for (literally) years: a REST API to allow programmatic generation of datasets, rather than forcing people to use the UI. This has always been something of a white whale for me. Initially I couldn’t see a way to solve it, then when I did, I always had a long list of work that needed to get done first.

But enough is enough! Today I got the core code working and a nice proof of concept in place: the Names Data Type is now working in conjunction with the JSON Export Type to generate random names in JSON via a REST endpoint. Pretty cool.

This should be a staggeringly useful feature and I think it’s conceptually pretty cool, so I’ll take the time to explain it here.

The problem

I designed the Data Generator to be modular so it could be used to generate any sort of data you want: text, numbers, strings, images, silly cat pictures – really anything. Check the developer section of the documentation for an explanation of all that. The problem with this design was all the little pieces of the code were separate entities, had their own configuration settings and generated different things. To add a REST API meant exposing all the functionality offered within the current UI to developers, and that meant making the options within the UI well-defined.

The solution

So here’s what I settled on. The two work-horse module types: Data Types (the type of data being generated) and Export Types (the format in which the data was generated) now define themselves via a JSON schema file (using the json-schema.org site and spec). That lists the name and structure of all generation options for that module. It says what fields are optional and what’s required, and what field types (string, boolean, etc.) each setting is.

For example, let’s look at the Names Data Type. That module is used to generate human names – first names, last names, male, female names, initials, surnames etc. The schema file looks like this:

1
2
3
4
5
6
7
8
9
10
11
{
   "title": "Names",
   "$schema": "http://json-schema.org/draft-04/schema#",
   "type": "object",
   "properties": {
      "placeholder": {
         "type": "string"
      }
   },
   "required": ["placeholder"]
}

Kinda readable! Even coming in fresh. If you look at the generatedata UI and select “Names” for a row, this maps exactly to the options offered in the “Options” column. For this Data Type there’s only one option: a string that contains placeholders, which are switched out for names during data generation.

Now, to use the API, developers need to POST JSON content to a specific endpoint (http://yoursite/generatedata/api/v1/data) in a particular format. Here’s a simple example that generates 100 names of the format “Beth R. Mackenzie”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
  "numRows": 100,
  "rows": [
    {
      "type": "Names",
      "title": "A full name",
      "settings": {
        "placeholder": "Name Initial. Surname"
      }
    }
  ],
  "export": {
    "type": "JSON",
    "settings": {
      "stripWhitespace": false,
      "dataStructureFormat": "simple"
    }
  }
}

Conceptually it’s super simple. The contents of the settings object in the rows array always contains whatever settings are relevant for that Data Type. Similarly with the settings object in the export object below. So for the rows property, you define an array, each index of which is a data type containing whatever settings you want. Here’s an example that generates two rows of Names, in different formats:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
  // ...
  "rows": [
    {
      "type": "Names",
      "title": "A full name",
      "settings": { "placeholder": "Name Initial. Surname" }
    },
    {
      "type": "Names",
      "title": "A female name only",
      "settings": { "placeholder": "Name" }
    }
  ],
  "export": {
    // ...
  }
}

I’ve also added some nice clear error handling that tells you exactly what’s wrong, which should prove invaluable for debugging.

In the coming days I’ll be finishing off the code and working on the documentation. As I said, now I’ve gotten the proof of concept going it should be straightforward to complete for here on out. Keep you posted. :)

Read More

generatedata.com back up again

Posted on Jul 19, 2014 in generatedata.com, Open Source Projects | 6 comments

There was a giant CPU spike yesterday when generatedata.com was linked to on reddit, causing servers to melt, kittens to die, etc. etc. I’ve moved the script over to a dedicated host with about 5x the resources, so the site is already loading a loading a lot snappier. But this is just a temporary fix: I need to properly scale it out and work on improving the efficiency of the code – something I’ve meant to do for a while.

I’ll cut back on my birding this weekend and start now. :)

Read More

generatedata.com down

Posted on Jul 18, 2014 in generatedata.com, Open Source Projects | 2 comments

I’m really sorry for the inconvenience, but generatedata.com is currently offline. There was a giant spike in the CPU today and my hosting company felt they had to take it down. I’m moving the site to a dedicated server and will hopefully have it back up shortly.

Apologies!

Read More

generatedata 3.1.2

Posted on Jul 12, 2014 in generatedata.com, Open Source Projects | 1 comment

I’ve been continually under the weather the last couple of weeks so didn’t manage to get much work done. Boo. But today I just released a maintenance release for generatedata.com. The new release contains a number of bug fixes.

Website: http://www.generatedata.com
Code: https://github.com/benkeen/generatedata
Documentation: http://benkeen.github.io/generatedata/
Bug fixes in this release: https://github.com/benkeen/generatedata/issues?milestone=12&page=1&state=closed

Read More