Geo-Spatial reporting with R

Many times we need to plot geo-spatial data in analytics. Information like sales per region, income distribution makes more sense when they are plotted on a map. We can do this quite easily in

R. Let us see it in action.

First of we need data about the map. There are many libraries from where we can download this data

for our personal use. Here we will use data from http://gadm.org/  Here there are data at different levels of details available for most countries. Let us use data for India. To load this downloaded data into R, first open R in R-Commander and change your working directory to where you saved the file


setwd(“/home/soumyanath/Downloads/R_Maps”)
and then read the data into a variable with

ind1 <- readRDS(“IND_adm1.rds”)

Let us check what kind of data has been loaded with

class(ind1)

It will show

[1] “SpatialPolygonsDataFrame”

attr(,”package”)

[1] “sp”

As it is SpatialPolygon, let us load library(sp)

library(sp, pos=4)

library(methods, pos=4)

Now, the question is, how do we see this data? There is a function to plot spatial data, we use that

spplot(ind1, “NAME_1″, scales=list(draw=T), colorkey=F, main=”India”)

will show a map. Actually I do not like it, it shows a truncated view of Kashmir, but then we are

using data from an USA repository and I have no means to influence them. We shall revisit this part at a later stage on how to correct the maps, but for now, let us make use of what we have. To manipulate data we need to know properties of the data that we have. We can look into the loaded data with names function. It shows:

> names(ind1)

 [1] “OBJECTID”  “ID_0”      “ISO”       “NAME_0”    “ID_1”      “NAME_1”    “HASC_1”    “CCN_1”     “CCA_1”    

[10] “TYPE_1”    “ENGTYPE_1” “NL_NAME_1” “VARNAME_1”


We can also check property of the data by using

summary(ind1)

This will show various properties of data loaded. Right now we are interested in knowing ID for the states so that we can use it to color the maps with our data. We can user print(ind1) to view complete data, but in this case it will be a huge print. In this example we will use state ID “HASC_1” to plot our data. We can see the values with:

print(ind1$HASC_1)

Right now we do not have any data so we populate a excel sheet and fill data with state ID, fill some sales data and assign a color value based on sales amount. In reality we will probably use a data base to get this data. We save data into csv format and read it in R by:

pdata = read.csv(“filename”)

confirm data has been read correctly

We add a new property color.data into the dataframe ind1 based on color values taken from csv file

ind1$color.data =pdata[pdata[1]==ind1$HASC_1,3]

Now we plot the map with these color. The command is

spplot(ind1,”NAME_1″,  col.regions=ind1$color.data, colorkey=T, main=”Indian States”)

We have the result here

Same concept can be extended to district level for more granular analysis.

In our next blog, we shall see how to link this map projection with database to get real time analysis

A Window for Ubuntu

I replaced my Windows 7 O/S with Ubuntu. This has given me many new things that were missing so far in my life. Biggest change being the way I use my laptop. Earlier with 10min bootup time of Windows 7, I normally kept the laptop in suspend mode most of the time. Now Ubuntu takes about 1min 48sec to boot. There is no excuse to keep the laptop running with Ubuntu. Shutdown time is around 40 sec, Now, I can afford to do some quick personal work.

Next change is the way I install software. Most programs that I use were either installed by default or installed from Ubuntu repository. Installing from Ubuntu repository is as easy as typing the name and hit install key. Earlier I used to keep all program installer in a folder, nicely categorized and archived. That headache is mostly gone, I still need to do it for some programs, but the size is less than 1 GB.

There is one problem, that happened after this change. That’s the baggage of my old files. I have some files that need windows program to open. On Ubuntu, there are two ways to run windows program. One is a Windows emulator – WINE.  Other is to run windows in a virtual machine.

I installed WINE. This is a simple step, open installer, and search for WINE. It gives me three options – Wine, wine loader and wine tools. Install all three. It takes about 10 min and that’s it. You are good to run many windows program. I purchased a uC it that require a special program loader (in windows) picpgm. Opened console, Cntrl-Alt-T,  and typed “wine picpgm” and there it goes. Windows program, running nicely on Ubuntu. 


Next was to do the same with my other programmer pickit2. I try again the same magic command – wine pickit2. Uh Uh! Same magic do not work. Now it complains about missing dot net.  Well, dot net is a big package that is proprietary MS. I don’t expect wine to emulate it.  I knew, I have to install windows on a virtual machine.

Like the choice of coffe in a Cub Food store, Ubuntu has number of Virtual Machine software. The choice is endless and can be quite confusing to newcomer. After few month of experimentation, I knew how to choose coffee – go for one you like. For me it is only one flavor – Medium roasted Columbia. I follow the same principle in Linux. Choose one software that works for me. For Virtual machine, I use one software that is easy to use – it is virtual box. 

Virtual box is readily available in ubuntu repository. Installing it is as easy as type “virtualbox” and click the install button. Once Virtual box is installed I popped in the WIndows recovery disk from my laptop. I thought it had a copy of Windows 7, that I own. Disk whirred and copied some files and then asked for last copy of the recovery disk fo partition my hard disk. Basically it was trying to get my machine to factory condition rather than just install windows. This is not what I intended. Cancelled the job. Next I thought, I should download a fresh copy of Windows from Microsoft. I had the product key – entered that in the software download page and Microsoft says: This product is a third party supplied product. Conduct the vendor. 

Well, so much for the product key and ownership and software write. Basically it means you can’t install fctory supplied windows on virtual machine. Good!  Don’t get fooled with factory supplied windows. If you need Windows, buy a hardware without preinstalled windows and then buy O/S seperately. It is worth those few extra doller.

Fortunately I had a copy of XP disk, that was not factory installed. I changed the Virtual machine from Windows 7 to XP and put the installer disk. Here you see Windows XP is getting installed. This installation process is just like installing it in a new PC. With Virtualbox basically you have created new “virtual” computer within your computer. This has its own virtual memory and virtual hard disk.




20 min later windows XP is installed. Spend some time in setting up Windows. Here you see windows running on the smaller screen over virtual machine. This way, one can run multiple O/S at the same time on the same hardware.

 

Now it is time to do some serious stuff with Windows. I have my favorite game – Master of Olympus -Zeus. I put the game installer CD, everything goes nicely and now I have Zeus running on my Linux machine. That’s called peaceful co-existence!


So far My Window was isolated. I need to connect WIndows to internet. In this world, hardly anything works without internet connection. Windows authentication is one need. Then there are some programs and update that are needed from internet. After a few false try, I found a good instruction that worked for me to connect windows to net. Here is the link. Credit goes to Irfan for a cool and correct instruction. Now I can authenticate and pacify the creaking  window for genuineness. Now you can see me exploring internet, doing some drawing and listing files in a transparent Linux console.

 

 

Done with Windows

Since last week I was getting a SMART error from my laptop. This means some sector of my drive has become bad and drive is going to fail in near future. Nice little warning means a big work. Recovery disks I made for this laptop was about four year old, I don’t rely on them. So burnt a new set of recovery disks, backed up my data and set out to replace my Hard drive.

 

This is a HP Pavilion dm4 model, with i5 Dual core CPU. It is old, but I never faced much of a performance issue. The drive I had was 600GB. This time I purchased a 1TB drive. Idea was to put some version of Linux in dual boot mode. I use lot of Mukt (Free) software, most of these are port of Linux, so using them with Linux makes better sense. After some contemplation, I settled for Ubuntu. I had a disk of Ubuntu installer, but that was around two years old. I decided to download the latest version for Desktop, Ubuntu 14.10. Ubuntu comes in four major flavors:

  1. Desktop : that comes with GUI and window manager
  2. Server: that will setup a Linux Webserver but does not have GUI
  3. Kylin: With Chinese support
  4. Cloud: for building openstack cloud


Ubuntu disk images are available for free. Desktop version is around 1.5GB. It is better to use bittorrent to download such big files. There is no point in straining the central server and collecting the data from far away location. To download Ubuntu, I installed a bittorrent client and selected alternate download location. Let the download run for the night. In morning, I had the disk image waiting for me to be burnt into a fresh DVD.

Replacing the disk was 10 min job. Thanks to HP, all you need to do is to take out 2 screw and HDD is accessible. I knew, recovery will take time. I had to feed 6 disks one after another when computer will copy all the files. Manual at HP site says it will take 2 – 3 hours. I started the work at 11:00 AM on Saturday, expecting it to finish by 2:00PM. Anyway the work actually completed only at 8:00PM. To be fair, couple of times I went out and computer was waiting for disk change, but a increase in this factor of 3 was not expected.

Work done and with a factory fresh setting, I went to install install Ubuntu. First thing for this is to change boot setting to allow computer to boot from DVD. Next is to make some room for the new OS. So I started windows Disk Manager. Oh, it shows me 4 partition. I knew there will be a partition for recovery files. It was there, then there is a tiny 200 MB boot partition, well I grant that there it makes things safe, then there is the main partition nice 970GB of it, with around 70 GB occupied by Windows 7. There is another partition called HP tools. Which is not something I bargained for. Four primary partition is basically aimed to ensure that I can not install another OS over it. IBM hardware has this restriction of 4 primary partition. I take this as nothing but pure mischief. As it is I am now having second thought with HP products. HP has a policy of push advertisement. In the name of assistance it keeps bugging me to buy this or that. Now this restriction had my patience with HP to its limits. I scrapped dual boot idea and decided for all Linux machine.

Once the decision is made rest of the work becomes easy. Popped in the Linux DVD in drive and reboot the machine. Within minute the machine booted from DVD and asked for option to install. It was simple, install over entire drive. Next screens were name, password location etc. It asked me if I want to update the software, I selected yes. That’s it, for about two hours softwares were installed and then I got a screen to log in.

Ubuntu comes with most required software pre-installed. Out-of-box you get full libre office suit. C++ compiler is part of core Linux. For programming a feature rich text editor is required. Gedit does the job to certain extent. But there is nothing to beat emacs. Emacs is not packaged with ubuntu. To install open Ubuntu software center, type emacs and click on install button. That’s it. After few seconds emacs is installed. Here you go:
 

Biggest problem that one has to shift OS are the collection of programs that one builds. I also have a good number of Windows program that I can’t leave without. Office and text editors taken care of, now I want to have a comfort. I do not want to search for replacement for each and every windows program. There are two ways to run Windows program within ubuntu. First and easiest is wine. No, it is not the drink. Wine in ubuntu stands for windows emulator. I use software center again and install wine. For my microprocessor programming, I use a program called picpgm. I don’t know of any ubuntu alternative for it, nor do I care much. I run this one with wine. For this I select picpgm executable in file browser, right click and select run with wine program loader option. Thats it!
 
 


Other programs that I use frequently are password saver – keepassx; messenger – pidgin; SIP – ekiga; Voice recorder – Audacity; Photoeditor – gimp all of these are available with ubuntu. Some of these are preinstalled, rest I installed from software center. Ubuntu comes with screen capture tool, but I wanted a tool that will allow me to capture a part of it, like I am using it here. Shutter is the software of my choice. For typing in Bengali, Avro is again readily available. But Libre office language needs to be installed to use it. It takes some work, that took around 30min. That’s it ! After about 4 Hrs, I am done with almost all functionality that I had in Windows. 
 
There is one additional feature that’s lacking in Windows, that the multiple desktop.
Woops!!!
Some 8 Hrs of work later, I am back in business with all tools that I had earlier and some more. Now let us take a peek on my disk:
 
Just 15GB. I could have carried the OS in a SD card!!!

A Personal Website

A personal website is like owning your own house in the cyberspace. Like house, it comes in all shapes, sizes, features, flavors. It also has its own advantages and issues. Craze for personal website started with GeoCities and AltaVista making it easy to create a website for your own. While it satisfied the narcissist within us, soon we realized its limitation.

One puts up a website for various reasons. Most important that comes to my mind are:

Web presence: You may like to grab a piece of cyber reality, have your identity known around cyber world. At times, you feel comfortable to have a bill board that people may give more attention than they give it your card.

Information dissemination: You may like to tell those things that no one seem to listen. Putting them in nice font and format may look more promising than standing on the podium at market corner

Collaboration: You may need to share information that require little more structure than what is available in simple mail

Consolidation of information: You may use web mashup or pool information from other web pages for analysis. Facebook, orkut, Google News, iGoogle, myYahoo are good examples of these

Showcasing talent: You may want to show case your skill of photography, performing art in picasa, flicker, you-tube etc.

E-Commerce: You may be interested in buying and/or selling things online

Data Repository: Some may use cloud computing, and shift their information repository to online

Today there are number of option to host the website. You can use a free site like google-site or take some shared box webhost service costing less than $10 per year or chose to host it on data center. It all depends on what kind of traffic you expect to get and what kind of material you want to distribute.

If your page is like Richard Stallman – simple with only text then you will be able to manage even very high hit rate with relatively low end server. On the other hand a website heavy with flash and graphics, say like J K Rowling’s will need a high end server even for a modest number of hits.

Apart from cost, there are other consideration of service reliability and connection speed. In such cases many free sites offered on piggyback of big players offer advantage.

Web space requirement is the next improtant factor; essential but can be managed. Once I had a request from a professional society to make them a Content Management System. Now a CMS like Joomla or Drupal takes almost 5MB space just for the core software, your own stuff comes over and above that. But with careful redisn, it is possible to reduce even a CMS to less than 1 MB. tinyCMS was result of that effort. In similar line you have simpleBlog and TigerWiki.

This is not a problem, if you plan to setup the server from your own home. If you have a broadband connection and willing to put up your machine always on and connected to internet then setting up a website at your home desttop is a simple thing. (more of that in my later blog)

One thing personal website, in own setup suffers is ranking in search engine. These gives higher rating to known popular sites rather than low traffic pesonal site. As an example my blog – http://soumyanath.blogger.com has about ten times more traffic than my earlier blog http://www.soumya.name/blog

In current context perhaps, it is not going to be a choice of setting every thing up at one nice little site with a monolithic site. Current age is of clound computing and mashups. So put your stuff in various places, mixthem and present at a site with consolidation, parts of the stuff may come from various other sites which may or may not be owned by you.

Let’s Linux

Linux is an operating system developed by people who loves computers. An operating system is a collection of programs that tells your computer on how to store pieces of information to your hard disk. How to connect to internet and so many things, that normally you take for granted. Most of the first time computer users are forced with a operating system developed by Microsoft Corp. called Windows. It charges about $200 USD for the software. of which a substantial portion goes as comission to your dealer. No doubt – then he does not tell you about a similar (actually some what better) software available to you free of cost.

There are deals to make that $200 vanish. The vanishing trick is called pre-loaded OS. Do not fall for it, compare prices of hardware with similar spec and you will discover the OS difference.

Some not so ethical vendor will also try to give you a deal – they will offer you to load a pirated version of Windows free of cost. DO NOT FALL for iit. They do it – so that they can sell you another useless piece of software called antivirus. Not only that, once you use it, you are legally in the same boat as a thief. Would like to have such distinction?

But you have a choice. A choice to dup all that pricy OS and anti-virus and so on. Why bother with a piece of OS that could not address its security hole to Virus even after 20 years. Go for Linux. It is virtually free from virus attack. As a result, your machine has all its power to work on the task rather than checking for virus. It comes with all the goodies like Office, torrent client, compilers etc. and it is available free. Free to install and free to distribute.

You can get a copy of Linux from your friendly neighborhood LUG (aka. Linux User Group) or search around for some freebie site.

Now that you have read this page: go the right way – USE Linux. Use ethical software without fear; with full freedom.

Linux is a free software. It comes with freedom to use, distribute, view and modify the code.