Idea
I wanted my tracker to:
- Don't track users --- Google Analytics can be used to track persons that e.g. dislike being tracked by Google, even if said persons never signed up for any google service, most other Free solutions also have really abusive ToS;
- Be free (as in beer) --- paying 7.5 EUR for lowest tier matomo cloud, and well this is more than cheapest PHP hosting (and if I'd self-host matomo I'd get ownership of my users data);
This weekend I wanted to myself to teach a little bit of golang, and I was wondering "what piece of golang I can write", and hence I thought "lets write some lambda function that will do my blog counter".
Overall architecture
Normal trackers work like that:
- You inject some javascript from the tracker website.
- Javascript collects as much information as you need and calls back to the tracker website.
I didn't want to use this method, as:
- I don't need a tracker (I don't want to track people);
- I don't want to complicate stuff;
- I would like to count even people with their javascript off.
So here is alternate architecture:
- Lambda function is a normal HTTP endpoint that returns some resource (empty js script, empty css stylesheet, transparent pixel image --- you get it);
- When browser downloads the url we store all headers in dynamoDB.
Note
Some points on choosing AWS instead other providers: without searching I knew that google and AWS provided stateless services (lambda functions), google didn't allow their lambda functions to be written in Go (well, I guess I could hack something, but I wanted to have something working by the afternoon, and cutting back on uncertainties was a good thing).
How it works specifically
Web-page contains following link:
<link rel="stylesheet" href="https://o5h84vss67.execute-api.eu-central-1.amazonaws.com/default/LambdaCtr" media="none">
One hack here is that I have added invalid media query media=none so browsers won't block rendering this webpage until they get the css (but download the css nevertheless).
Lambda function returns random Etag and no-cache, no-store, must-revalidate, then dumps all headers to DynamoDB.
I was trying to make my lambda return a empty png image, but it seems that lambda functions don't like binary response content (hinted e.g. here).
There is no script to analyze the data (yet) I'll try to figure something out once I get some data.
Golang part
So golang was surprisingly easy, I didn't got into any non-obvious problem with the language itself. golint and go vet (and go compiler) caught all errors, and had sensible messages.
Even explicit error management (which seemed counter intuitive) ended up obvious and not really problematic.
Only issue I bumped into was with dependencies, it looks that right now you too many ways to download dependencies of your program:
- go get --- part of the standard, but only downloads master version of the dependency. I didn't want this as it would bite me when (if ever) I would want to tweak lambda function.
- glide --- first response my search engine found on dependency management in go. No support in GoLand (JetBrains IDE for golang)
- dep --- dependency manager that was used for a long time, but is now sunsetted as standard vget emerges.
- vget --- new dependency manager for go, which is now being standardized (is part of the standard in newest golang)
In the end I used dep for my dependencies it was cool.
Repository for the golang code is here.
Note
In this project (being dead simple) I didn't have chance to use goroutines and channels, which are the "new" thing in Go, so I had no chance to do too much errors.
AWS Part
I didn't suspect that but working on AWS part of the project will be so irritating. Tutorials were non-existent, and contradictory. There were three ways to do everything, and I still don't understand what is the difference between user and role.
Also documentation for AWS GO sdk is almost nonexistent, documentation for dynamoDB marshalling package consists of couple of sentences and one example.
So to set-up my lambda tracker I needed to:
Create DynamoDB Table;
DynamoDB requires your objects to have primary key which is used for partitioning the data across instances. In real life it is very important that this partitioning key has good distribution of values (essentially is a good hash).
Since request headers that I store are essentially sent to me by untrusted party (that is you), I used synthetic random partition key.
Note
If someone manages to control partition keys for data in dynamodb, they can direct all reads/writes to a single partition which hurts performance a lot.
Create Policy for my lambda to access my said table (and nothing more);
This policy looks like that:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "dynamodb:BatchWriteItem", "dynamodb:PutItem" ], "Resource": "<insert arn here>" } ] }
To get your table arn visit table main page it should display ARN.
Attach this policy to a IAM Role (not User); Also you should attach AWSLambdaBasicExecutionRole so your role can log things.
Create new Lambda Function using AWS web UI;
Attach API Gateway endpoint as Trigger for your Lambda;
Note endpoint url for this trigger.
Prepare golang code for upload. To do this you'll need to:
- Build your package so you have an executable. In my case executable was named lambdactr.
- Zip this package and inside your zip you need to have lambdactr file (you might end up with e.g. build/lambdactr inside zip file, so double-check)
- Upload zip to your function.
Set handler in your lambda to match executable you uploaded (lambdactr);
You probably can test everything now;
Future work
I wanted to have auto deployment of this lambda function, but in the end it turned to be too much of a hassle.
I got distracted by the fact that there are two ways of updating/creating a lambda:
- aws lambda create-function --- this is the way I should have used;
- CloudFormation stuff --- this is the way I ended up researching;
Anyway thanks for reading, and (once again) if you want to browse the code, here it is:.