Data Science Programming

Make your own data platform for the Internet of Things using Node.js and Express.js

Prologue

I recently wrote a tutorial explaining how to make a connected barometer in which I used Thingspeak as an endpoint for the data. With the current buzz around the Internet of Things, a lot of similar services popped up : Plotly, the Wolframe data drop, Xively and even IBM Cloud to name a few.

What I find problematic with theses services is that you loose control over your data. What are they used for, what happens if the company closes ? You don’t want to lose your preciously collected data.

One solution is to create your own data platform. This way you keep full control over your database. You can set-up backups and are sure that your digital property wont be used for commercial purposes behind your back.

We will thus be making IoT.me, a web application using Node.js as a server, Express.js for the framework and MongoDB as the database (the MEAN stack, without the A).

IoT.me : application requirements

Let’s take a minute to think about our needs. First, we want to be able to authenticate the user. IoT.me is built as a single-user application, but the backend is ready to support multiple users. We need a way to create new collections of data which we will call Datasets, and to customize the fields (variables) of this collection. Each dataset will be issued a read and write API key that can be publicly distributed to authenticate requests sending data in or requesting data out. We also need a way to easily regenerate theses keys if they end up being compromised.

How does that translate in term of views ? Well first we need a setup screen so that the user can register. Once logged in, he will need some kind of dashboard giving a quick overview of his datasets. He needs to be able to create new datasets, to edit (as well as delete) and view them. Finally we will also provide a simple ‘settings’ page if he ever wanted to change its username or password.

Table of content

This tutorial will be organized in the following way : first we will take care of installing all the per-requisite. We will then make our way through the MVC (Model View Controller) pattern in a slightly different order, starting with the models defining the structure of our data, then the controllers (called routes in Express.js) and finally the corresponding views. At each step will be associated a specific commit holding our progress thus far.

Initial set-up

Note : this tutorial assumes that your are running a linux architecture. I’m personally working on ubuntu 15.04.

Server dependencies

The following modules will be needed to run our application. Start by installing them if it’s not already the case :

  1. Node.js: the server of our application. If you’re more used to that, it’s like apache but running on javascript. It comes in with npm, an awesome package manager that will make our life much easier.
  2. MongoDB: a very versatile database system. I like it because of its object-like structure and the fact that ‘tables'(collections) can be created on the fly even if they were not previously registered.
  3. Express-generator: an npm package used to generate an express.js skeleton with a simple command. Express.js is a lightweight framework for Node.js. At the difference of ‘traditional’ frameworks such as Laravel on PHP or Django on Python it doesn’t come with a clearly defined structure which can be a bit confusing at first (especially when you’re reading different tutorials that each use their own way of doing things). The counterpart is that it offers a great deal of flexibility to the developer about the way he wants to structure its application.
  4. Nodemon: a utility that will monitor for any changes in your source and automatically restart your server. We will use it instead of node to run our application, this way we won’t need to restart the server after each modification (pretty handy he!).

Express.js skeleton

Navigate to the directory of your choice and generate the skeleton of our application. After that, install the default dependencies. Finally, a quick test will ensure that everything is running properly.

The app is being served on port 3000. To check if everything is going well, fire up your browser at the following address : http://localhost:3000. You should receive a warm welcome from Express.

Corresponding code

Application dependencies

We’re now ready to add the packages required for our application :

  1. Mongoose : our ODM (Object Data Manager).
  2. Connect-mongo : used to store the user’s session information in the database.
  3. Express-session : a standard session manager for Express.js.
  4. Method-override : this package will be used in order to pass PUT and DELETE requests though POST requests.
  5. Bcrypt : for password hashing.
  6. Hat : to generate unique API keys.

All these packages will be installed with the –save argument so that they are automatically added to the package.json file.

Setting up the app

We need to make a few changes to the app.js file. First let’s import the packages we just installed :

Then add a connection to the database (which we name ‘iotdb’):

We need to add two middlewares, the first being express-session. As I previously said, it is used to store the user’s session info, which is pretty handy if you want to keep him logged in between views (and to ensure that a user has been authenticated before accessing protected resources). We will also use it in order to pass messages between the views. Connect-mongo is used to store the session’s informations in the database.

Finally we will configure method-override to look for a hidden field called ‘_method’ in our POST requests. This way we will be able to post PUT and DELETE requests as well.

That’s it ! We’re ready to define our first models.
Corresponding code

The models

Create a new folder in your application directory called models. We need to create two models : one to store the user’s informations and the other for the datasets.

Users

Create a new file called users.js in the models folder. This model is very simple : we want to store a username and a password, as well as the timestamp of creation.

But this is not all. We also need to ensure that the password is stored as a hash so that it never falls into the wrong hands if your database is ever compromised. This will be done by adding a middleware relying on bcrypt to hash the password value before adding the user to the database. Don’t forget to import bcrypt and to set the salt factor.

Finally, we need to add a method to the schema that will enable us to compare passwords based on their hash. This is done so that we can authenticate users later on.

Datasets

The dataset model will be a bit more evolved than the last one. It will hold the read and write API keys, the name of the dataset, a unique index used to identify the dataset (we wont use the _id of the dataset because we don’t want to end up with a url looking like /datasets/de1eee192869b90f993acf3857691f07 so we’ll generate a shorter one), the username of the owner of the dataset, a public option which will authorize non authenticated users to access the dataset view, two fields holding the timestamp of creation and last data entry, the number of data entries and finally the data.

Create a new file called datasets.js filled with the following code :

Don’t forget to import the models in app.js :

Corresponding code

The routes

Now that we know which data to store, it’s time to see how we will handle it. This is where things get trickier (well, only a little). We will need 3 categories of controllers :

  1. One for the user handling, which will be limited to a CRUD (Create Read Update Delete) API that will allow us to act on our users schema. This is what I meant when I said that our application will be ready to serve multiple users even if this version is not made for it : you will be able to rely on this API.
  2. One to deal with the datasets. It will control the new, edit and show views, as well as provide yet another CRUD API allowing to act on the datasets model. This is where we will handle the requests pushing new data points to the dataset and asking for the existing data.
  3. Last but not least a more general controller will handle the remaining views and requests : setup, login, dashboard (index), settings and logout.

An authentication middleware

But before jumping into the routes we need to make a small detour. We don’t want anyone to be able to use the APIs we are going to make. Only an authenticated user should be able to modify his account, right ? Same thing with generating or modifying API keys, as well as accessing the dashboard or the dataset if it is made private.
In order to do that, we will create a middleware that will be called before each route for which authentication is needed. Create a new file called utils.js in the root directory. This is where we are going to put our middleware which will simply ensure that the current user in session has been authenticated. Since we pass the user’s information to the session manager after each login request, this is what we will be looking for. If the user is indeed authenticated, we pass the request to the route. Otherwise we redirect to the login page with and error message. Pretty straightforward.

Users CRUD API

You’ll notice that the routes folder is already included in our skeleton. Isn’t that convenient. We will progress step by step from now on. Start by importing the User schema and the authentication middleware we juste made.

Then we’ll create the first method of the API : POST used to create a new user. We first check that the user is authenticated before letting him perform the operation. The controller simply takes the username and password provided, calls the create method on the user schema and responds according to the result. Remember the hashing middleware we wrote earlier ? It will be called automatically so that the password will be stored as a hash. The newly created user is returned as a json.

The GET request is even simpler : it looks for a user matching the id provided in the url and returns it as a json.

The PUT request is used to update a user’s information. We ask the user to confirm its new password in case of change so that he doesn’t lock himself out of the app by mistake. So we check if both password fields match, create an update request according to which new information was provided and update the user document. You’ll notice that we use the save method instead of update. This is so that the hashing middleware is called, which wouldn’t be the case with update. If an error occurred, we redirect to the settings page with an error message, else we redirect to the index with a success message an update the current session with the new user’s info.

As for the DELETE request, it fetches the user matching the id provided in the url and deletes it as you would expect it to. An error/success message is passed to the session according to the result.

Corresponding code

Datasets

This one was not included by default, so create a new datasets.js file in the routes folder and prepare it with the required imports.

CRUD API

We’ll set the GET request aside as it involves dealing with the read API key.

Let’s start with the POST. We retrieve the user’s name from the current session informations. Then we extract the dataset name and the public option values from the body of the POST and delete these keys from the object. Why ? Because we then need to iterate through the remaining keys to retrieve the information concerning the variables to include in our dataset. The dataset is then created and the user is redirected to the index which is served with an appropriate message to display.

Hat is used to generate unique read and write keys and a unique index of 9 characters is generated by…what’s that ? A new helper function we didn’t create yet ?! Let’s repair that! Go back to the utils.js file and add the following function :

The PUT request follows a similar logic than the POST, except that we have to be a bit smart about how we handle the variables : if a variable is found on the request but not on the existing dataset, we add it to a list of variables to be set. Otherwise if it is absent for the request but present in the dataset, it means that it has to be removed so it is added to a list of variables to be unset. Once again, we then redirect to the index page with the appropriate message.

The DELETE request shouldn’t be a mistery by now :

API keys

We will now take care of 3 special cases we didn’t handle yet : how to push data and get data providing the appropriate API keys.

Updating an API key wont be difficult. A POST request is made to a specific route, providing the _id of the dataset that needs to be updated and the name of the key (read or write). We store the redirect url as we can’t hardcode it this time : it depends on the index of the dataset being updated.

The request to add new entries to the dataset will look like that: /datasets/update?key=WRITE_APIKEY&var1=value&var2=value etc. Each variable must have been registered beforehand for the value to be stored. They are selected by they tag which is normalized, contrarly to their name which are personalized by the user. We retrieve the write API key from the GET request arguments and delete the key from the query object so that we can then iterate through the variables and their values. The dataset is fetched according to the key provided. Before inserting the values, we check that the variables have beem previously registered. The values are added by the $push operator in the data document. An appropriate response code is then sent depending on the result.

Here what the structure of data looks like :

“data”:{
“var1”:{“name”:”Variable A”,”values”:[[1,1442867230518]]},
“var2”:{“name”:”Variable B”,”values”:[[2,1442867230518]]}
}}

As you can see it is composed of nested documents each representing a variable. Each variable document is indentified by its tag. It holds its name (if defined) and an array of values. Each value is a composed of the value per-se and a timestamp of the time of insertion.

To request a dataset, one need to provide the correct read API key. A new object representing the dataset stripped from sensible information is created and returned as a json. Here is what you can expect as a response :

{
“owner_name”:”Rocky”,
“name”:”My dataset”,
“index”:”3IEUDLFEZ”,
“public”:true,
“created_at”:”2015-09-21T20:18:46.478Z”,
“last_entry_at”:”2015-09-21T20:27:10.518Z”,
“entries_number”:1,
“data”:{
“var1”:{“name”:”Variable A”,”values”:[[1,1442867230518]]},
“var2”:{“name”:”Variable B”,”values”:[[2,1442867230518]]}
}
}

These two routes must be placed before the others for them to work properly.

Views controllers

Now for the easy part. Here we simply control which information is sent to our views.

We don’t need anything to create a new dataset, only to render the proper view :

To edit a dataset, we simply need to retrieve it according to its index and pass these informations to the view :

The show view is a bit tricky. We retrieve the dataset according to its index and serve it with a stripped version of the document (for reasons already evoked). The tricky part is that we check the public option. If it is set to true, anybody will be able to access this page. Otherwise we call our authentication middleware to ensure that only an authenticated user can access it.

Finally import and register the new route in app.js:

Corresponding code

Index

You should have got the hang of it by now, but let’s repeat it one more time. Start by including the required models and our helper:

Requests controllers

The first thing our user needs to be able to do is to POST a setup request in order to create his account. This request will redirect him to the index page where he will be able to create his first dataset.

If he comes back, he’ll probably want to login. In order to do that, we will find the user registered under the login he provided and compare the password he entered with the one from the user retrieved thanks to the helper we created earlier. If no user is found under this username or if the passwords don’t match, we redirect him on the login page with an appropriate message. Otherwise we store his informations in the current session and redirect him to his dashboard.

Finally in case of logout we create a new session and redirect to the login page, serving the messages in the process if need be:

Views controllers

The first view to take care of is the landing page. What happens there ? We first check if a user has been registered. If it hasn’t, it means that our visitor needs to complete the setup process (we thus render the setup view). If a user is present in the database but not in the current session, he needs to login. Otherwise he can be redirected to the index.

The controller for the setup page simply serves messages if need be and renders the setup view.

For the index (or dashboard, or watchamacallit), we retrieve every dataset owned by the user and pass them to the index view, along with the messages in the session:

As for the settings view, we pass it the current user in session, along with the mes…you got it by now right ?

Corresponding code
And that’s it for the controllers! If you’re still reading, I’m sending you a warm virtual tap on the shoulder. We’re almost there!

The views

We will rely on the jade templating engine to create our views. You are of course free to use anything else, but I personally find it convenient. The first thing you need to do is to add our front-end dependencies : bootstrap with the glyphicons fonts, jQuery, Chart.js to handle our plots, a logo and a favicon. You can also delete everything in style.css. Since we now have a favicon, we can uncomment and modify the following line in app.js:

Corresponding code

Landing page

Let’s put the common elements between the setup and login page in a new landing.jade file that you will create in the views folder:

The first page on which the user will arrive is the setup page, composed of a logo and a form allowing to open a new account. Create a new file called setup.jade composed of:

If a user is already registered, he will arrive on the rather similar (if not almost identical) login page. The key difference is that the route called by the form request is not the same. Once again, create a file called login.jade with the following content:

Corresponding code

General views

We need to take care of the layout that will be used as the base of every views that follows. It will include a navbar and import the scripts shared by the views.

The index will be composed of a table presenting an overview of the existing datasets, as well as an option to create new ones:

The settings view requires a new settings.jade file. It is simply a form allowing to update account’s information:

Corresponding code

Dataset views

Start by creating a new subfolder datasets in the views.

The new.jade view (a new file that you just created without having to be asked to because you are so ahead of things) is yet again composed of a form. We use jQuery in order to dynamically add or remove variable fields.

edit.jade is extremely similar, except for the fact that we display display the variables already registered and provide a way to regenerate the dataset’s API keys.

As for show.jade, it relies on Chart.js to generate a line chart for each variable. If you’re looking for more elaborate options, have a look at the D3.js and C3.js libraries. I chose Chart.js because it is very easy to implement and it gives fairly good looking results.

Corresponding code.

And there you go! Your very own data platform is up and running. You can congratulate yourself, it was not such an easy project to build. And there is still much to be done. You can customize the front-end to match your preferences, add layout options to your plots, open your app to multiple users (setting a password to the database would probably be a good idea too 😉 ).

I would be very pleased to see what you will make out of it, so don’t hesitate to drop me a line in the comments!

Sources

21 Comments

  1. Another great article 😉

    This is s topic I resurrect from time to time in my mind.
    The first cloud platform (for sensor/data) storage that I used is (now) Xively, former Cosm, former Pachube back in 2011.
    I also use some other platforms including Node-Red in the last year, but your solution seems to be in the right direction for someone who wants to own their data, manage the upload frequency and regain all the freedom that seems to be much farther away each day regarding our data (whatever it is).
    I now just a little abou Node.JS and MongoDB and nothing about Express.JS, but I am confident that with your article I will learn a lot and get to own my data back.
    Thanks for sharing 😉

    1. Thanks for the kind words. Don’t hesitate to contact me if you have any questions (a comment would be the best as it could be beneficial to others).

    1. See, I didn’t even knew there was a limit on the size of the documents.

      I ran some back of envelope calculations. Let’s take a dataset named ‘My dataset’ with 5 variables each named ‘Variable A’, ‘Variable B’ etc. With the following parameters, the approximated formula for the weight of the document W with V the number of variables and x the total number of data points is :
      W = 32*V +26*V + 26*x + 250
      32*V is the cost of storing the name of the variable (in bytes)
      26*V is the cost of creating the ‘data’ subdocument (in bytes)
      26*x is the cost of pushing an integer to the ‘data’ array (in bytes)
      250 is the general overhead related to the dataset in itself (api keys, name etc.)(in bytes).

      So, with 5 variables this gives us :
      W = 58V + 26x + 250 = 26x + 540

      if max(W) = 16Mb = 16 000 000
      16 000 000 – 540 = 26x
      15999460/26 = x
      x = 615364

      It’s an approximation but it gives a good idea. About 600k data points can be stored with the maximum document size. It’s not enough for intensive applications but it should cover the needs of most hobbyists.

      Nevertheless I’ll work on a solution in order to circumvent this problem, thanks again for mentioning it!

  2. Thank you for you calculation. For hobbyist it will be enough, but in this case it needs an old data remove process, because if you reach the limit you will get error message and you can’t save the new values. But you don’t know how many variables will be in a channel. So it’s not easy.

    I think, you should store data of channels in a separate collection and only place an ObjectID to link to the channel collection.

    1. That’s a perfectly valid solution. With a new collection containing 1 document per variable it would allow 600k data points / variable to be stored. The capacity could probably be a little less than doubled by splitting the [value, datastamp] pair into two collections as well.

      I’ll implement a fix shortly!

  3. Hi, Thanks for the tutorial. I tried to run your GitHub package after npm install and node app.js. I am getting the error below:

    IoT.me/node_modules/mongoose/node_modules/mongodb/lib/server.js:242
    process.nextTick(function() { throw err; })

    1. Hi,

      Googling your error code, I found several references to similar situations.

      https://github.com/Automattic/mongoose/issues/2861
      http://stackoverflow.com/questions/29857572/mongodb-cryptic-error-process-nexttickfunction-throw-err-error-at-obj
      https://www.reddit.com/r/learnprogramming/comments/4c46ro/expressjs_mongooseconnect_crashes_app/

      According to these discussions, it seems to me that you are experiencing one of these situations:
      1 – you are not providing the right address to mongoose for it to connect to your mongodb instance (so wrong url or ip) or your mongodb instance is protected and you don’t provide the password.
      2 – Mongodb is not running because it wasn’t started (in that case type “mongod” in the shell to start it, or “mongodb” to see if you can have access to the db shell at least).
      3 – Mongodb is not running because it’s having some issue (sometimes the log file is not found or the data folder provided in the path was not created stuff like that).

      Anyway, have a look at the links and let me know if it helps!

      1. Hi, Thanks. Yes, I did not have my MongoDb started. Are you planning to use the site with Particle IoT devices?

        1. I had it set-up with a simple weather station I made based on the ESP-8266 but it would work very well with the Photon (I’ve been playing with the Electron lately too – GSM enabled microcontroller which comes with its own battery and a bunch of IOs). I’ve used some of this code as a base for other applications, one of which involved a Photon and I had to modify the payload system a bit to work with web hooks events: when some data is ready to be sent, I use Particle.publish() to publish an event. I have a callback registered on the cloud (I don’t know if you are familiar with their cloud system or not but if not it’s all in their doc) which looks like that:

          {
          “eventName”: “some_event_name”,
          “url”: “http://52.xx.xxx.xxx:3000/payload”,
          “requestType”: “POST”,
          “json”: {
          “event”: “{{SPARK_EVENT_NAME}}”,
          “published_at”: “{{SPARK_PUBLISHED_AT}}”,
          “coreid”: “{{SPARK_CORE_ID}}”,
          “category”: 0
          },
          “mydevices”: true
          }

          so when an event called some_event_name is published by my photon it triggers this which in turn makes a POST request from the cloud to my server.

          Do you already have an application in mind?

          1. hi, yes. I just want to log temperature and events with a Photon. I am assuming you are using ParticleJS SDK and HTTP/SClient library to post data to /update?api_keys from the device.

            I’d like to use mysql mainly because I am not too familiar with MongoDb and its document size limitations you mentioned.

          2. Thanks. I just saw your details reply below about Particle.publish and web hooks. I like Particle because of their simple programing tools but I don’t want to rely on them too much with the device events and my data.

          3. yeah I get that. In that case you’re right using an http library would be the way to go.

  4. what a great project 🙂 how do you make it so there can be more users ? cant seem to find it anywhere, and the code for ESP is the same you would use for thingspeak ?

      1. So here is another problem, when i try to send something to the server, it responds like this

        New entry for dataset with API key: 6aaf688d4480470efba4eeffa8bb2c8a
        _http_server.js:192
        throw new RangeError(Invalid status code: ${statusCode});
        ^

        RangeError: Invalid status code: 1
        at ServerResponse.writeHead (_http_server.js:192:11)
        at ServerResponse.writeHead (/root/IoT.me/node_modules/on-headers/index.js:55:19)
        at ServerResponse.writeHead (/root/IoT.me/node_modules/on-headers/index.js:55:19)
        at ServerResponse._implicitHeader (_http_server.js:157:8)
        at ServerResponse.OutgoingMessage.write (_http_outgoing.js:446:10)
        at writetop (/root/IoT.me/node_modules/express-session/index.js:290:26)
        at ServerResponse.end (/root/IoT.me/node_modules/express-session/index.js:338:16)
        at ServerResponse.send (/root/IoT.me/node_modules/express/lib/response.js:204:10)
        at ServerResponse.sendStatus (/root/IoT.me/node_modules/express/lib/response.js:341:15)
        at /root/IoT.me/routes/datasets.js:40:25
        at Query.callback (/root/IoT.me/node_modules/mongoose/lib/query.js:2129:9)
        at /root/IoT.me/node_modules/kareem/index.js:259:21
        at /root/IoT.me/node_modules/kareem/index.js:127:16
        at _combinedTickCallback (internal/process/next_tick.js:67:7)
        at process._tickCallback (internal/process/next_tick.js:98:9)
        [nodemon] app crashed – waiting for file changes before starting…

  5. Hi Alexis,

    As a node numbie trying to catch my arduino data into a DB and view it through a browser, I really appreciate this tutorial/application! It appears you slowed down on development about a year ago, but if you have a few minutes, there are a couple of tweaks that would improve this tutorial for others:

    (1) More incremental testing and verification that each coding step would help cement the functionality and structure as well as validate that there aren’t any editing mistakes,
    (2) more explicit target filenames … half the time I had to go to your github resource to figure out which file I was creating/editing,
    (3) when I got to the end, I had made many mistakes – rather that go through it all again, I cloned your repository. To get it working, I had to run “npm install” – duh, but when you’re brain-dead at the end of the day after implementing this tutorial, a little hand-holding goes a long way :-).

    “Create User” is not working so I will have to go back and figure out why but I am sure there is a step I missed when I cloned the repository and it will work once I do.

    Last thing – at the top of app.js, you have a semicolon at the end of the first line. I’m not sure why it seems to work but I thought, as a list of variables, it should be a comma like the following lines.

    Thanks!
    Dave

    1. Hello David,

      I am glad that you found your tutorial helpful and greatly appreciate your comment and suggestions as I am always trying to improve on my pedagogy.

      Unfortunately I won’t be able to make any modification for the time being. I pivited my activities from programming and engineering to cycling around the world which doesn’t leave me much time to take care of the blog (@wherethefuckisalexis on instagram if you’re into that sort of things).

      That being said, I wish you best of luck in your arduino endeavours. I hope that it will bring you as much enjoyment as it did for me.

    2. Hello David,

      Thank you for your comment and suggestions. I am always trying to improve on my pedagogy and am still learning myself how to disseminate my knowledge in the most clear and accessible way.

      That being said I won’t be able to update the blog for the time being as I am currently engaged in a bicycle tour around the world that might keep me busy for a year or two (@wherethefuckisalexis on instagram if you are into that sort of things).

      Best of luck in your arduino endeavours, I hope it will bring your as much enjoyment as it did for me.

Leave a Reply