tag:raghothams.svbtle.com,2014:/feedRaghotham S2015-04-26T05:22:17-07:00Raghotham Shttps://raghothams.svbtle.comSvbtle.comtag:raghothams.svbtle.com,2014:Post/parsing-json-in-scala2015-04-26T05:22:17-07:002015-04-26T05:22:17-07:00Parsing JSON in Scala<h2 id="introduction_2">Introduction <a class="head_anchor" href="#introduction_2">#</a>
</h2>
<p>I started a side project on Scala with a group of friends (noobs in scala). We chose Scala because it is well known for type safety and functional programming with support for OOP.<br>
One of the important parts of the project was speaking to a REST API which returned JSON responses.</p>
<p>We began our hunt for efficient JSON parsers on scala and soon we were flooded with libraries:</p>
<ul>
<li>spray-json</li>
<li>jerkson</li>
<li>jackson</li>
<li>json4s</li>
<li>jacksMapper</li>
</ul>
<p>With so many options, we were confused! Thanks to this <a href="http://engineering.ooyala.com/blog/comparing-scala-json-libraries">wonderful post</a> from Ooyala Engineering team for putting up a nice comparison of libraries. Finally, we decided to go ahead with <strong>json4s</strong> because we found it handy to extract objects out of the JSON and also the support it has for Jackson (faster parsing).</p>
<h2 id="problem_2">Problem <a class="head_anchor" href="#problem_2">#</a>
</h2>
<p>The problem with most of the libraries listed above, especially json4s, is the poor documentation. The examples given are straight forward cases where the structure of the JSON response and the object model are exactly same.</p>
<pre><code class="prettyprint">scala> import org.json4s._
scala> import org.json4s.jackson.JsonMethods._
scala> implicit val formats = DefaultFormats // Brings in default date formats etc.
scala> case class Child(name: String, age: Int, birthdate: Option[java.util.Date])
scala> case class Address(street: String, city: String)
scala> case class Person(name: String, address: Address, children: List[Child])
scala> val json = parse("""
{ "name": "joe",
"address": {
"street": "Bulevard",
"city": "Helsinki"
},
"children": [
{
"name": "Mary",
"age": 5,
"birthdate": "2004-09-04T18:06:22Z"
},
{
"name": "Mazy",
"age": 3
}
]
}
""")
scala> json.extract[Person]
res0: Person = Person(joe,Address(Bulevard,Helsinki),List(Child(Mary,5,Some(Sat Sep 04 18:06:22 EEST 2004)), Child(Mazy,3,None)))
</code></pre>
<h2 id="what-if-we-want-to-convert-part-of-the-json-i_2">What if we want to convert part of the JSON into an object? <a class="head_anchor" href="#what-if-we-want-to-convert-part-of-the-json-i_2">#</a>
</h2>
<p>From the above example, what if we want to convert only the adress information into an object? There is very little or no documentation which guide beginners to accomplish such task.<br>
<img src="http://i.imgur.com/PGMDnU8.png?1" alt="tweet"><br>
<a href="https://twitter.com/shrayasr/status/590936104716488704">Link to tweet</a></p>
<h2 id="solution_2">Solution <a class="head_anchor" href="#solution_2">#</a>
</h2>
<p>We can traverse the JSON by giving it a path expression. In the above example, we can traverse to the <code class="prettyprint">address</code> object by giving it the path from the root, which is <code class="prettyprint">"address"</code></p>
<pre><code class="prettyprint">scala> json \ "address"
</code></pre>
<p>The above statement will do the traversal and returns a <code class="prettyprint">JValue</code>. Once we have the <code class="prettyprint">JValue</code> for the address, we can convert it into an <code class="prettyprint">Address</code> object by using the <code class="prettyprint">extract</code> method</p>
<pre><code class="prettyprint">scala> case class Child(name: String, age: Int, birthdate: Option[java.util.Date])
scala> case class Address(street: String, city: String)
scala> val json = parse("""
{ "name": "joe",
"address": {
"street": "Bulevard",
"city": "Helsinki"
},
"children": [
{
"name": "Mary",
"age": 5,
"birthdate": "2004-09-04T18:06:22Z"
},
{
"name": "Mazy",
"age": 3
}
]
}
""")
scala> val addressJson = json \ "address" // Extract address object
scala> val addressObj = addressJson.extract[Address]
res1: addressObj: Address = Address(Bulevard,Helsinki)
</code></pre>
<p><strong>BOOM!</strong> You have extracted an object of type <code class="prettyprint">Address</code> from the JSON.</p>
<pre><code class="prettyprint">scala> val children = (json \ "children").extract[List[Child]] // Extract list of objects
res2: List[Child] = List(Child(Mary,5,Some(Sat Sep 04 23:36:22 IST 2004)), Child(Mazy,3,None))
</code></pre>
<p>Now you have created a List of type <code class="prettyprint">Child</code></p>
<p>The general trend I see is that the <strong>Getting started</strong> or <strong>Usage</strong> guides available for various libraries do not help beginners start off quickly on a given problem. We need better beginner docs that showcase examples which are close to real world scenarios.</p>
tag:raghothams.svbtle.com,2014:Post/markers-with-d32015-01-22T10:58:28-08:002015-01-22T10:58:28-08:00Markers with D3<p>Every time I look at the examples page of D3, I’m simply go…<br>
<a href="http://i.imgur.com/bhLuxln.gif"><img src="http://i.imgur.com/bhLuxln.gif" alt="Mind Blown"></a><br>
<a href="https://twitter.com/mbostock">@mbostock</a> has transformed how visualizations are created for web.</p>
<p>Today I learnt how to use svg markers with D3. I was using force layout to analyze graphs, just like this <a href="http://bl.ocks.org/mbostock/4062045">example</a>. But I wanted a directed graph!<br><br>
<a href="http://i.imgur.com/t9aydD3.png"><img src="http://i.imgur.com/t9aydD3.png" alt="yuno-meme"></a></p>
<p>Later, I came across another <a href="http://bl.ocks.org/d3noob/5141278">example</a> which had direction. I was happy because a ready-made solution solved the problem. But soon I ran into problem as I wanted a custom tree like structure with every path being directed i.e I wanted the arrow markers at the end of each path.</p>
<p>I went back to the ready-made solution and had a look at the part of code which was generating the arrows.</p>
<pre><code class="prettyprint">// build the arrow.
svg.append("svg:defs").selectAll("marker")
// Different link/path types can be defined here
.data(["end"])
// This section adds in the arrows
//this makes the id as 'end', coming from data
.attr("id", String)
.enter().append("svg:marker")
.attr("viewBox", "0 -5 10 10")
.attr("refX", 15)
.attr("refY", -1.5)
.attr("markerWidth", 6)
.attr("markerHeight", 6)
.attr("orient", "auto")
.append("svg:path")
.attr("d", "M0,-5L10,0L0,5");
</code></pre>
<p><em>Thanks to <a href="https://gist.github.com/d3noob">d3noob</a> for adding comments to the code</em></p>
<p>The above code just creates an arrow. This can be added to any element later by adding the below code as its attribute.</p>
<pre><code class="prettyprint">svgElement.attr("marker-end", "url(#end)");
</code></pre>
<p>So what is the magic happening here? Lets look closely at what we are doing while building the arrow.</p>
<pre><code class="prettyprint">svg.append("svg:defs").selectAll("marker")
.data(["end"])
.attr("id", String)
.enter().append("svg:marker")
.attr("viewBox", "0 -5 10 10")
.attr("refX", 15)
.attr("refY", -1.5)
.attr("markerWidth", 6)
.attr("markerHeight", 6)
.attr("orient", "auto")
.append("svg:path")
.attr("d", "M0,-5L10,0L0,5");
</code></pre>
<p>We are creating an <strong>SVG def</strong> which will generate the arrow head. SVG defs are a way of defining graphical objects which can be applied to elements. The definition can be anything like defining a marker or defining a gradient as specified in <a href="https://developer.mozilla.org/en-US/docs/Web/SVG/Element/defs">MDN example</a>. Now that we have created a definition, this can be easily applied to any element.</p>
<p>We will use the defined marker and apply it to every path we have by altering the path’s attribute</p>
<pre><code class="prettyprint">svg.selectAll(".link")
.attr("marker-end", "url(#end)");
</code></pre>
<p>We use the <code class="prettyprint">marker-end</code> attribute and assign the definition id as its value (in our case it is #end). <a href="https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/marker-end">marker-end</a> attribute is used to add arrowhead or any other object at the final vertex of the path.</p>
<p>Now that we have added arrows to the path, lets see the output<br>
<a href="http://i.imgur.com/kRSOQs9.png"><img src="http://i.imgur.com/kRSOQs9.png" alt="output"></a></p>
<p>A peek into DOM<br>
<a href="http://i.imgur.com/mlB3Spz.png"><img src="http://i.imgur.com/mlB3Spz.png" alt="dom-svg"></a></p>
<p><a href="https://c1.staticflickr.com/3/2724/4416219525_eebf385a7d_z.jpg?zz=1"><img src="https://c1.staticflickr.com/3/2724/4416219525_eebf385a7d_z.jpg?zz=1" alt="thats all folks"></a></p>
<p><em>Thanks to mbostock, d3noob, MDN</em></p>
tag:raghothams.svbtle.com,2014:Post/dlna-on-raspberry-pi2014-09-07T12:00:40-07:002014-09-07T12:00:40-07:00DLNA on Raspberry Pi<p>I always wanted to setup a media server at home for the following reasons:</p>
<ol>
<li>Reduce redundancy - having multiple copies of media for different devices like phone, tablet, smart TV etc</li>
<li>Ease of use - no need to copy files to and from devices to play media (mostly <em>Floyd</em> and movies)</li>
<li>One stop shop with transmission integration - download files on rpi and they appear on the media server</li>
</ol>
<p>The easiest solution was to turn my <strong>RaspberryPi</strong> into a DLNA server. For this I required to a few basic packages and had to configure each.</p>
<p>It was a bit hard to find all of them in a single post and hence I’m writing this post.</p>
<h2 id="packages-required_2">Packages required <a class="head_anchor" href="#packages-required_2">#</a>
</h2>
<ul>
<li>samba</li>
<li>nginx (for transmission)</li>
<li>nfs</li>
<li>ntfs (optional, to support ntfs file system)</li>
<li>transmission-daemon</li>
<li>
<p>minidlna</p>
<pre><code class="prettyprint">sudo apt-get install samba samba-common-bin
sudo apt-get install nginx
sudo apt-get install nfs-kernel-server nfs-common portmap
sudo apt-get install ntfs-3g # if you want ntfs
sudo apt-get install transmission-daemon
sudo apt-get install minidlna
</code></pre>
</li>
</ul>
<h2 id="samba_2">samba <a class="head_anchor" href="#samba_2">#</a>
</h2>
<p>Append /etc/samba/smb.conf</p>
<pre><code class="prettyprint">[public]
path = /path/to/public/folder
browseable = yes
writeable = yes
guest ok = no
read only = no
</code></pre>
<h2 id="minidlna_2">minidlna <a class="head_anchor" href="#minidlna_2">#</a>
</h2>
<p>Edit /etc/minidlna.conf</p>
<pre><code class="prettyprint">media_dir=/path/to/public/folder
media_dir=V,/path/to/public/videos/folder
media_dir=A,/path/to/public/music/folder
media_dir=P,/path/to/public/pictures/folder
friendly_name=rpi
</code></pre>
<h2 id="transmissiondaemon_2">transmission-daemon <a class="head_anchor" href="#transmissiondaemon_2">#</a>
</h2>
<pre><code class="prettyprint">mkdir -p /opt/torr
sudo chown -R debian-transmission /opt/torr
cp /etc/transmission-daemon/settings.json /etc/transmission-daemon/settings_template.json
</code></pre>
<p>Edit /etc/transmission-daemon/settings.json</p>
<p>Change the value of <em>download-dir</em> field to /opt/torr </p>
<pre><code class="prettyprint">{
..
"download-dir": "/opt/torr",
..
}
</code></pre>
<h2 id="time-to-test_2">Time to test! <a class="head_anchor" href="#time-to-test_2">#</a>
</h2>
<pre><code class="prettyprint">sudo service samba stop
sudo service samba start
sudo service minidlna stop
sudo service minidlna start
sudo service transmission-daemon stop
sudo service transmission-daemon start
</code></pre>
<p>To test if transmission daemon is running, open <a href="http://rpi_ip_addr:9091/transmission/web/">http://rpi_ip_addr:9091/transmission/web/</a></p>
<p>IP address of devices keep changing and hence it is difficult to access it with IP address.<br>
We can solve this problem by using the <code class="prettyprint">.local</code> domain. For this we need avahi-daemon and tweak the hosts file</p>
<h2 id="avahidaemon_2">avahi-daemon <a class="head_anchor" href="#avahidaemon_2">#</a>
</h2>
<pre><code class="prettyprint">sudo apt-get install avahi-daemon
</code></pre>
<p>Edit /etc/init.d/hostname</p>
<p>Change</p>
<pre><code class="prettyprint"> 127.0.1.1 raspberrypi
</code></pre>
<p>to </p>
<pre><code class="prettyprint"> 127.0.1.1 [new name here]
</code></pre>
<p>Reboot rpi</p>
<p>Now you should be able to access your raspberrypi using the URL <a href="http://host_name.local">http://host_name.local</a></p>
<p>For example, <a href="http://raspberrypi.local">http://raspberrypi.local</a></p>
<hr>
<p>PS: Most of the times <em>minidlna</em> does not refresh the collection in the specified folders. We need to explicitly run the following command</p>
<pre><code class="prettyprint">sudo minidlna -R
sudo service minidlna restart
</code></pre>
<p>This problem might be because of the inotify functionality of the linux kernel. It has to be enabled by the kernel. A solution is posted <a href="http://stackoverflow.com/questions/5180409/why-minidlna-not-refreshing-database">here</a></p>
<h3 id="courtesy_3">Courtesy <a class="head_anchor" href="#courtesy_3">#</a>
</h3>
<ul>
<li><a href="http://www.naspberrypi.com">NASpberrypi</a></li>
<li><a href="http://www.ryukent.com/2013/09/a-local-url-instead-of-an-ip-address-for-your-raspberry-pi/">Avahi stuff</a></li>
</ul>
tag:raghothams.svbtle.com,2014:Post/text-search-on-postgresql2014-05-31T10:55:07-07:002014-05-31T10:55:07-07:00Text Search on PostgreSQL<p>PostgreSQL has out of box support for text search.</p>
<p>Assume we have a table of documents: </p>
<pre><code class="prettyprint">CREATE TABLE documents
(
id serial NOT NULL,
doc text
)
INSERT INTO documents(doc)
VALUES ("Lorem ipsum .....");
INSERT INTO documents(doc)
VALUES ("Quick brown fox .....");
------------------------------------
id | doc
------------------------------------
0 | "Lorem ipsum ....."
1 | "Quick brown fox ..."
</code></pre>
<p>A simple text search is a basic requirement in any system. This can be done using <code class="prettyprint">tsvector</code> and <code class="prettyprint">tsquery</code> types in PostgreSQL.</p>
<p>[<strong>tsvector</strong>](<a href="http://www.postgresql.org/docs/9.1/static/datatype-textsearch.html">www.postgresql.org/docs/9.1/static/datatype-textsearch.html</a>) gives us the list of lexemes for any given text.<br><br>
[<strong>tsquery</strong>](<a href="http://www.postgresql.org/docs/9.1/static/datatype-textsearch.html">www.postgresql.org/docs/9.1/static/datatype-textsearch.html</a>) helps facilitate the search by creating lexemes for search terms, combine search terms / lexemes and compare with tsvector for result.</p>
<p>The <code class="prettyprint">to_tsvector</code> method processes text by removing stop words, stem and normalize words so that they can be used with different variants of the word.<br><br>
For example, <strong>precision</strong> would become <strong>precis</strong> and <strong>running</strong> would become <strong>run</strong></p>
<p>On every insert of a document, we need to get the normalized text of the document and add it to the normalized text column. For this we need to create a new column of type tsvector.</p>
<pre><code class="prettyprint">ALTER TABLE documents ADD COLUMN tsv TYPE tsvector;
</code></pre>
<p>Next, we need to create a trigger function that will update the <strong>tsv</strong> column on every insert</p>
<pre><code class="prettyprint">CREATE TRIGGER tsvupdate
BEFORE INSERT OR UPDATE
ON documents
FOR EACH ROW
EXECUTE PROCEDURE tsvector_update_trigger(tsv, 'pg.catalog.english', doc);
</code></pre>
<p><code class="prettyprint">tsvector_update_trigger()</code> is built-in method which takes arguments - </p>
<ol>
<li>column to store the normalized text<br>
</li>
<li>language of the text (because removing stop words and stemming is specific to a language)<br>
</li>
<li>column_to_read_text_from<br>
</li>
<li>column_to_read_text_from (takes multiple columns as input)</li>
</ol>
<p>With data populated inside the documents table, we can perform a simple text search using the query:</p>
<pre><code class="prettyprint">WITH q AS (SELECT to_tsquery('brown:*') AS query)
SELECT id, doc, tsv from documents, q where q.query @@ documents.tsv;
</code></pre>
<p>The <code class="prettyprint">to_tsquery</code> function will convert the input text to tsquery type which can be used to do logical operations - <code class="prettyprint">& (AND), | (OR), ! (NOT)</code>, with lexemes and perform <strong>prefix matching</strong> using <code class="prettyprint">":*"</code></p>
<p>The <code class="prettyprint">@@</code> operator checks if the tsvector matches tsquery</p>
<p>So the above query would return us documents which contain “brown”<br><br>
<img src="http://imgur.com/fiEP4ot.png" alt="example-image"></p>
<h2 id="limitation_2">Limitation: <a class="head_anchor" href="#limitation_2">#</a>
</h2>
<p><code class="prettyprint">tsvector</code> and <code class="prettyprint">tsquery</code> will only help us find words from a given text but <strong>not substring matching</strong>.<br>
For substring matching we will have to use the <code class="prettyprint">pg_trgm</code> extension (Trigram based text search). The pg_trgm extension can perform <code class="prettyprint">LIKE</code> operation on text fields.</p>
tag:raghothams.svbtle.com,2014:Post/machine-learning2014-05-13T21:05:14-07:002014-05-13T21:05:14-07:00Machine Learning<p>I had zero knowledge about this topic but wanted to explore. Took <a href="http://lshtc.iit.demokritos.gr/">Large Scale Hierarchical<br>
Text Classification</a> (LSHTC) as my MS project, so that I have a good scenario to start Machine<br>
Learning</p>
<p>The first thing I wanted to know was the format of data provided by LSHTC. Turned out that it<br>
was SVM format. The training data and test data had the following format</p>
<p><code class="prettyprint">label,label,label… feature:value feature:value</code></p>
<p>The <strong><u>label</u></strong> indicates the category the document belongs to.</p>
<p>The <strong><u>feature:value</u></strong> vector represents a word and its weight (<strong>TF</strong>) in the document.</p>
<h2 id="choice-of-programming-language_2">Choice of programming language <a class="head_anchor" href="#choice-of-programming-language_2">#</a>
</h2>
<p>Had to make a choice between <code class="prettyprint">Java</code> and <code class="prettyprint">Python</code></p>
<p>I chose <code class="prettyprint">Python</code> for the following reasons:</p>
<ol>
<li>
<strong>Huge set of Machine Learning libraries</strong> - given that I was a beginner, this made a lot of
impact. More libraries, more documentation, more examples => more experiments and
better understanding</li>
<li>Most of the Machine Learning this day is done with python</li>
<li>Less cumbersome to try out a scenario - given that python is more of a scripting
language, experiments could be made quickly especially with <strong>IPYTHON</strong>
</li>
<li>Also the hype around it these days :)</li>
</ol>
<h2 id="libraries_2">Libraries <a class="head_anchor" href="#libraries_2">#</a>
</h2>
<ol>
<li><p><a href="http://scikit-learn.org/stable/">scikit-learn</a> - massive collection of different algorithms for Regression, Classification,<br>
Clustering, Dimension reduction, Model section pipelining etc</p></li>
<li><p><a href="http://mlpy.sourceforge.net/">mlpy</a> - similar to scikit-learn but offers a smaller set</p></li>
<li><p><a href="http://graphlab.com/">graphlab</a> - more of a recommendation engine</p></li>
<li><p><a href="http://spark.apache.org/">Spark</a> - very good parallel ML framework but still in its early stage. Does not offer many<br>
algorithms</p></li>
</ol>
<p>I started off with sci-kit. It offeres a huge range of libraries & algorithms. I then had to do a lot of<br>
reading about the basic stuff in classification like Hyper planes, linear and non linear<br>
classification, K-Nearest Neighbours and Support Vector Machine (SVM) - What SVM is and<br>
why is it used?</p>
<p><a href="http://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf">The Stanford NLP</a> book helped me a lot in understanding the basics of Classification</p>
<h2 id="algorithms_2">Algorithms <a class="head_anchor" href="#algorithms_2">#</a>
</h2>
<p>I’m an absolute beginner to Machine Learning and every algorithm I look at seems the right one.<br>
But only after experimenting each of them you know which is the best fit and why.</p>
<p>The problem I was solving was a medium scale data with 250,000 records of test data and 2<br>
million records of training data. Both training and test data large number of features.</p>
<h2 id="k-nearest-neighbour_2">K Nearest Neighbour <a class="head_anchor" href="#k-nearest-neighbour_2">#</a>
</h2>
<p>Started of the first trial using K-nearest neighbour algorithm. Turns out, this is a very good<br>
algorithm but doesn’t scale well with larger data set. There are a number of flavors of kNN<br>
which reduces the dimenion of feature vector like - KD Tree, Ball Tree. But still doesn’t help<br>
much while running larger dataset which > 10000 records<br>
Also I used to frequently get the error “Core dumped” when I tried plain kNN and kNN with <strong><u>chi2</u></strong><br>
best selection. Still figuring out the reason; feel it doesn’t scale for larger dataset. But I get the<br>
same error for smaller datasets of 100 records which is weird and hints me that I might be doing<br>
something wrong!<br>
After reading a few articles I came to a conclusion that it is better to use SVM for large datasets.</p>
<h2 id="support-vector-machines-svm_2">Support Vector Machines (SVM) <a class="head_anchor" href="#support-vector-machines-svm_2">#</a>
</h2>
<p>Support Vector Machine is one of the fast and efficient learning algorithms for classification and<br>
regression. Works well on large datasets. Linear SVM does a linear classification. We can define<br>
custome kernels for SVM. The SVM library in sci-kit offers commonly used kernels like</p>
<ul>
<li>linear</li>
<li>polynomial</li>
<li>Radial Basis Function (rbf)</li>
<li>sigmoid</li>
</ul>
<p>The result with RBF kernel turned out to be bad. The prediction was pretty bad, got the same<br>
label predition for most of the test data.</p>
<p>I switched to linear SVM and the results turned out to be quite decent.</p>
<h2 id="scaling_2">Scaling <a class="head_anchor" href="#scaling_2">#</a>
</h2>
<p>Given the problem is about Large Scale classification, scaling the algorithm to cater to large<br>
datasets is very important!</p>
<p>The algorithms in sci-kit library are <code class="prettyprint">in-core</code> , meaning, they run all the tasks on a single core.<br>
This turns out to be bad when running prediction on large datasets.</p>
<p>The way out is multicore processing by splitting the tasks. We can divide the task into sub tasks<br>
and run them on different cores. In my case, I split the test data into smaller subsets and predict<br>
them as different jobs, utilizing multiple cores. Sci-kit also provides a job processing library<br>
called <code class="prettyprint">joblib</code> which enables the above mentioned process.</p>
<p>Soon we will run into problem having multiple copies of the training data on each job doing the<br>
prediction. To overcome this, joblib provides <strong>memory caching</strong> of functions. This helps us not to<br>
create copies, rather share the memory across all jobs. The problem seems to be solved, but it<br>
will not work when we have large enough dataset that needs to be run on different machines!</p>
tag:raghothams.svbtle.com,2014:Post/database-triggers2014-05-13T10:51:57-07:002014-05-13T10:51:57-07:00Database Triggers<p>Database trigger is an event that can occur after or before a statement is executed or a row is modified / inserted / deleted.<br>
This can be used to perform any task before or after certain occurrence of an event in the database.</p>
<p>I was curious about this concept from a very long time and wanted to check it out. </p>
<p>I wanted to try an automation by creating a trigger function.</p>
<p><strong>Trigger function</strong> in <strong>PostgreSQL</strong> is a kind of function to which special variables are passed - NEW, OLD etc. More on trigger functions - <a href="http://www.postgresql.org/docs/9.3/static/plpgsql-trigger.html">here</a></p>
<p><code class="prettyprint">NEW</code> - variable sent to trigger function when the trigger is INSERT / UPDATE. This variable will contain the new row to be inserted / updated</p>
<p><code class="prettyprint">OLD</code> - variable sent to trigger function when the trigger is DELETE. This variable will contain the row to be deleted</p>
<p>To try out trigger functions I created three tables <code class="prettyprint">posts</code>, <code class="prettyprint">groups</code> and <code class="prettyprint">user_posts</code></p>
<p>I wanted to try an insert automation - On inserting a row into posts table, i wanted the DB to automatically insert rows into user_posts table. For this we need a trigger function like this:</p>
<pre><code class="prettyprint">CREATE FUNCTION update_user_post_association() RETURNS trigger
LANGUAGE plpgsql
AS $_$
DECLARE
row user_groups%rowtype;
BEGIN
FOR row in SELECT * from public.user_groups WHERE group_id = NEW.group_id
LOOP
EXECUTE 'INSERT INTO public.user_posts VALUES ($1, $2, 1)' USING row.user_id, NEW.id;
END LOOP;
RETURN NEW;
END;
$_$;
</code></pre>
<p>In the example we do the following steps:</p>
<ul>
<li>get all the users of the group</li>
<li>loop on the users</li>
<li>perform an insert using the available data</li>
</ul>
<p>It is important to note that the trigger function here uses dynamic SQL statement which is slightly different from a normal SQL statement.<br>
When using variables in an SQL statement it is always good to use placeholders like $1, $2 and use the <code class="prettyprint">USING</code> keyword to pass the variables.</p>
<p>Now that we have the trigger function, we need to tell the DB when to run this function. For this, we create a trigger</p>
<pre><code class="prettyprint">CREATE TRIGGER populate_users AFTER INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE update_user_post_association();
</code></pre>
<p>Now the DB executes the trigger function on every insert into the posts table.</p>
<p>Note : It is not a good idea to perform insert(s) in the trigger function because it may reduce the efficiency of the DB due to multiple inserts. When the DB is under heavy load, multiple inserts inside the trigger function might become slow and hence, the DB might start to queue connections.</p>