Raghotham S

Technology enthusiast | Databases | Machine Learning | Mobile apps | Analytics

Read this first

Parsing JSON in Scala

Introduction

I started a side project on Scala with a group of friends (noobs in scala). We chose Scala because it is well known for type safety and functional programming with support for OOP.
One of the important parts of the project was speaking to a REST API which returned JSON responses.

We began our hunt for efficient JSON parsers on scala and soon we were flooded with libraries:

  • spray-json
  • jerkson
  • jackson
  • json4s
  • jacksMapper

With so many options, we were confused! Thanks to this wonderful post from Ooyala Engineering team for putting up a nice comparison of libraries. Finally, we decided to go ahead with json4s because we found it handy to extract objects out of the JSON and also the support it has for Jackson (faster parsing).

Problem

The problem with most of the libraries listed above, especially json4s, is the poor documentation. The examples given are straight forward...

Continue reading →


Markers with D3

Every time I look at the examples page of D3, I’m simply go…
Mind Blown
@mbostock has transformed how visualizations are created for web.

Today I learnt how to use svg markers with D3. I was using force layout to analyze graphs, just like this example. But I wanted a directed graph!

yuno-meme

Later, I came across another example which had direction. I was happy because a ready-made solution solved the problem. But soon I ran into problem as I wanted a custom tree like structure with every path being directed i.e I wanted the arrow markers at the end of each path.

I went back to the ready-made solution and had a look at the part of code which was generating the arrows.

// build the arrow.
svg.append("svg:defs").selectAll("marker")
// Different link/path types can be defined here
.data(["end"])      
// This section adds in the arrows
//this makes the id as 'end', coming from data
.attr("id", String)
...

Continue reading →


DLNA on Raspberry Pi

I always wanted to setup a media server at home for the following reasons:

  1. Reduce redundancy - having multiple copies of media for different devices like phone, tablet, smart TV etc
  2. Ease of use - no need to copy files to and from devices to play media (mostly Floyd and movies)
  3. One stop shop with transmission integration - download files on rpi and they appear on the media server

The easiest solution was to turn my RaspberryPi into a DLNA server. For this I required to a few basic packages and had to configure each.

It was a bit hard to find all of them in a single post and hence I’m writing this post.

Packages required

  • samba
  • nginx (for transmission)
  • nfs
  • ntfs (optional, to support ntfs file system)
  • transmission-daemon
  • minidlna

    sudo apt-get install samba samba-common-bin
    sudo apt-get install nginx
    sudo apt-get install nfs-kernel-server nfs-common portmap
    sudo apt-get install
    ...

Continue reading →


Text Search on PostgreSQL

PostgreSQL has out of box support for text search.

Assume we have a table of documents:

CREATE TABLE documents
(
  id serial NOT NULL,
  doc text
)

INSERT INTO documents(doc)
VALUES ("Lorem ipsum .....");

INSERT INTO documents(doc)
VALUES ("Quick brown fox .....");

------------------------------------
id       | doc
------------------------------------
0        | "Lorem ipsum ....."
1        | "Quick brown fox ..."

A simple text search is a basic requirement in any system. This can be done using tsvector and tsquery types in PostgreSQL.

[tsvector](www.postgresql.org/docs/9.1/static/datatype-textsearch.html) gives us the list of lexemes for any given text.

[tsquery](www.postgresql.org/docs/9.1/static/datatype-textsearch.html) helps facilitate the search by creating lexemes for search terms, combine search terms / lexemes and compare with tsvector for result.

The to_tsvector...

Continue reading →


Machine Learning

I had zero knowledge about this topic but wanted to explore. Took Large Scale Hierarchical
Text Classification (LSHTC) as my MS project, so that I have a good scenario to start Machine
Learning

The first thing I wanted to know was the format of data provided by LSHTC. Turned out that it
was SVM format. The training data and test data had the following format

label,label,label… feature:value feature:value

The label indicates the category the document belongs to.

The feature:value vector represents a word and its weight (TF) in the document.

Choice of programming language

Had to make a choice between Java and Python

I chose Python for the following reasons:

  1. Huge set of Machine Learning libraries - given that I was a beginner, this made a lot of impact. More libraries, more documentation, more examples => more experiments and better understanding
  2. Most of the Machine Learning this...

Continue reading →


Database Triggers

Database trigger is an event that can occur after or before a statement is executed or a row is modified / inserted / deleted.
This can be used to perform any task before or after certain occurrence of an event in the database.

I was curious about this concept from a very long time and wanted to check it out.

I wanted to try an automation by creating a trigger function.

Trigger function in PostgreSQL is a kind of function to which special variables are passed - NEW, OLD etc. More on trigger functions - here

NEW - variable sent to trigger function when the trigger is INSERT / UPDATE. This variable will contain the new row to be inserted / updated

OLD - variable sent to trigger function when the trigger is DELETE. This variable will contain the row to be deleted

To try out trigger functions I created three tables posts, groups and user_posts

I wanted to try an insert automation -...

Continue reading →