Monday, July 23, 2018

Hemachandra numbers

As I was watching this interview of Prof. Manjul Bhargava, I came across this nice relation between लघु, गुरू syllables of Sanskrut and Hemchandra numbers. Prof. Manjul Bhargava asks a simple question as follows -

If लघु = 1 beat and गुरू = 2 beats, in how many ways can you fill up 8 beats? This also related to theory of percussion instruments.

This question can be answered in two ways. Before trying that, let's count the number of ways for some small number of beats say 0,1,2,3,4.
  • If number of bits = 0, ways to do it = {}, i.e. 0 ways
  • If number of bits = 1, ways to do it = {1}, i.e. 1 ways
  • If number of bits = 2, ways to do it = {11, 2}, i.e. 2 ways
  • If number of bits = 3, ways to do it = {111, 12, 21} i.e. 3 ways
  • If number of bits = 4, ways to do it = {1111, 112, 121, 211, 22} i.e. 5 ways
First way - the hard way!

If we want to fill up N bits, try to count the number of combinations - 
If N = 8. There are following combinations.
  • Number of लघु = 0, number of गुरू = 4
    • Number of ways to arrange these = 4C0 = 1
  • Number of लघु = 2, number of गुरू = 3
    • Number of ways to arrange these = 5C2 = 10
  • Number of लघु = 4, number of गुरू = 2
    • Number of ways to arrange these = 6C2 = 15
  • Number of लघु = 6, number of गुरू = 1
    • Number of ways to arrange these = 7C1 = 7
  • Number of लघु = 8, number of गुरू = 0
    • Number of ways to arrange these = 8C0 = 1
Total number of ways = 34. This is indeed 8th Hemchandra number as told in the lecture. But, this does not reveal the why this is 8th Hemchandra number. Next method reveals that.
 
Second way - the elegant way!

How can we fill in N beats if we know how to fill in lesser number of bits? Say, we know following ways to fill in 4 and 5 beats ->
4 : 121
5 : 1112
Then, to fill 6 beats, we just to need extend any way of filling 4 beats and append a गुरू e.g. from above 1212. We can also extend any way of filling 5 beats and append a लघु e.g. from above 11121.

Now, the connection with Hemchandra numbers is clear.
If F(N) is the number of ways to fill in a pattern of N beats then -
F(N) = F(N-1) + F(N-2)
Since as shown above, any pattern of N-1 beats can be extended by appending a लघु and any pattern of N-2 beats can be extended by appending a गुरू. So, F(N) is just the addition of those.

One might wonder why do we not need to place गुरू at all possible places in the pattern of F(N-2) (and similarly in F(N-1) for लघु). Here is the reason. Pick any pattern in F(N-2) say, 121. We need to put 2 now. If we put it in all places possible, here is what we get : 2121 1221 1221 1212. As we can see here, first patterns have patterns of 5 i.e. F(N-1) as prefixes so they are already being counted in F(N-1). So, it suffices to append it in the end only.

Monday, February 28, 2011

lex_ing and regex_ing

During engineering, this topic came and went.. w/o bothering me much.. just a small assignment on parser and done! But.. no.. the practical exam threw on my face a problem on designing parser in C (not with lex/yacc or generated one) and then I realized - well.. I had not understood it!

This was like 6+ years back.. and now I am here.. writing regex and grammar to generate parser with lex-yacc! That's life! :D ... And... it's very good!

Just came across a nice tutorial on lex yacc. Find it here. Even if you keep technical details apart and just glance through the tutorial, you would get a new outlook towards lex-yacc - specially regex - writing them and debugging them when they don't behave the way you expect them to..!

A very good and important thing I realized during this exercise is as follows. Parser design can easily be segregated in lex and yacc when you know your application well. Taking some reference from the tutorial link posted, for example - a simplified regex could serve your purpose given that you want to capture occurrences of a pattern; and you need a bit complex regex when you want to validate inputs. (Again from the tutorial,) For example, if you want to list down html tags in the browsed web crawl, all you need is a simple pattern that captures anything between a "<" and a ">". You need not __validate__ that __anything__. The basic assumption here - that you are expecting to get positive (meaning correctly matching with pattern) cases more in such scenario inherently comes from the application - i.e. capturing tags from web crawl. In an application of say - writing a SQL kind of language, you expect to get negative cases as probably as positive ones and just capturing __anything__ would not work. Again, I would say that it depends on how you look at it. You could always write a light weight regex and a comparatively heavy code in back end to validate your verbose input. It's up to you! But, understanding internals of how a regex engine works (given in the tutorial, for many cases), and mainly understanding its complexity and contrasting it with the complexity you would introduce by writing validation in code - is definitely important.

Indeed, a nice bunch of information!

Friday, September 24, 2010

mplayer - not playing files with _space_ characters

Recently I upgraded all packages on Ubuntu and somehow since then I could not play video files with vlcplayer! Not just that, somehow Realplayer and default movie player lost the codecs they used to previously use! After this miracle, the only option left with me was to try mplayer. I had heard that it supports many common formats and so on. I tried that and faced another issue. It magically could not play files having space characters in it!!

Initially I used to manually rename the file from browser before double clicking it for mplayer. Later on I found this very hectic when following happened. I renamed a long file name replacing some 7-8 spaces by underscores (Note: spaces, it converts to %20 and cannot play such file, underscore works!) and soon realized that I wanted to see another episode (with similar long file name)!!! So, I got irritated a lot and said to my friend that I will soon write a script to rename all files such the spaces are replaced by underscores. Now, I realized that all main / important video files (movies, acad lectures and so on) were on portable hard disk and the list will get augmented day after day as I get something good from some source, I will have a copy. Thus, this method sounded a bit overkill - to rename all file names just like that!

I then did following.
  1. I wrote following lines in a file ~/scripts/mplayer_helper.sh
    #!/bin/bash
    fname20=`echo $1 | sed 's/ /\\ /g'`
    fname=`echo $1 | sed 's/ /_/g'`
    mv "$fname20" "$fname"
    mplayer $fname
    The idea here is simple. Just create a file name (do some trials on command line first as to how exactly "mv" expects the input.) replacing space by underscore and replace file using "mv" command. Once that is done, invoke mplayer on renamed file.
  2. Then I made it executable using
    chmod +x ~/scripts/mplayer_helper.sh
  3. Then I created a soft link to this executable inside /usr/bin/ as follows:
    cd /usr/bin
    sudo ln -s ~/scripts/mplayer_helper.h sudomplayer
  4. Now, when I want to open video from browser, I simply right-click it and say "Open with". Then, I expand the option "Use custom command" and write "/usr/bin/sudomplayer" and say "Open".
  5. This not only opens the video by run-time substituting spaces with underscores but also once opened, adds this sudo application into the list of applications. Additionally, sudomplayer will be the default application shown for that file format then onwards.
Easy! Isn't it!

But, there is a glitch, what if not only the file to be played but also the recursive folders it is contained in have spaces. mplayer fails then as well! But, the solution is simple! Modify the script to take care of that as follows.
#!/bin/bash
xmessage -center -nearmouse Note that this tries to rename the files so this might fail if the file path does not have write permissions
i=2
token=`echo $1 | cut -d"/" -f$i`
more=`echo $1 | cut -d"/" -f$i | grep -i ^$ | wc -l`
currDir=""
while [[ $more -eq 0 ]]
do
        currDir=$currDir/
        token=`echo $1 | cut -d"/" -f$i`
        name20=`echo $token | sed 's/ /\\ /g'`
        dir20=$currDir$name20
        name=`echo $token | sed 's/ /_/g'`
        dir=$currDir$name
        if [[ $dir20 != $dir ]]
        then
                if [[ -a $dir ]]
                then
                        xmessage -center -nearmouse $dir File exists! Aborting!!
                exit
                fi
                mv "$dir20" "$dir"
        fi
        currDir=$dir
        i=`expr $i + 1`
        more=`echo $1 | cut -d"/" -f$i | grep ^$ | wc -l`
done
mplayer $currDir
The idea is here, to go one directory at a time. Try to rename it and then keep track of current directory such that now next level directory can be renamed. But, WARNING!! What if the current directory is inside "/home/xyz/abc/"? One might not have permission to replace "xyz" with "xyz" inside /home/. There are two ways to look at it. First, mostly the filenames with space character won't appear in such so_called __system_prohibited__ paths and even if they appear, it's better not to mess with it (at least automatically!) and thus, the initial "xmessage" serves the purpose. Another WARNING! is - what if we are trying to rename directory "dir with spaces" and there already exists a directory called "dir_with_spaces". To prevent such collisions, we will terminate the script once we find such case. Work around to this problem is to use some weird set of characters instead of "_", say like "_-_" or so; but, then if this script is run quite frequently, then your filesystem (portable hdd in my case) will be full of such weird names! ;)

Anyways, another (probably best?) way is - just Google around! Then will tell you some ways to get around to this problem of mplayer not playing files with space characters. But, then you will miss all the scripting_fun! :D :P

    Saturday, May 29, 2010

    Rhythmbox Music player - password window popping issue!

    I have been using Rhythmbox music player since almost 3 years now. Suddenly one day, it started popping a Give_ur_password type of window at every track. I used to close the box, then it would pop up again for thrice or so. It looked like the following.


    I tried Google-ing of course, but could not find anything. There were some posts regarding why / how some plug-ins for Rhythmbox keep popping windows. I did not find anything related to password window specifically. Still, I opened Edit->Plug-ins and observed that few plug-ins were enabled. For no apparent reason, I disabled the plug-in for Cover art (as shown below) and never got those disturbing pop up windows again!



    What I don't understand though is, that plug in was always enabled afaik, because I never explicitly enabled / disabled it in past, then why suddenly after almost 2.5 years, it started popping up windows?! Another thing is, as shown in first image, there is something called "xml.amazon-proxy caching server" - Have no clue how it could be related to cover art of an album!

    Anyways, will definitely post the explanation if I can find one, for time being let me enjoy the pop-up_window_free Rhythmbox!

    Thursday, April 15, 2010

    Approximate if you can do it in polytime!

    The main idea behind approximation algorithms is as follows. When one has to solve a problem which is proved to be NP-Complete, one has two choices.
    1. Use usual algorithm and get answer in non-polynomial time (most of the times exponential) and be ready to wait until eternity OR
    2. Compromise on the quality of answer but get things done faster
    As suggested by 2 above, approximation algorithms have the same aim. Instead of finding the correct, optimal, best possible solution to the given problem, they try to find an approximated one; but they guarantee to find it in polynomial time so that they run faster. Now the question  is - even if they are faster, how much does the quality of solution degrades? This is measured by Approximation factor or Approximation ratio.  This factor or ratio basically captures how far we are from doing the best i.e. from the optimal. Consider the optimization in terms of minimization. Suppose a minimization problem is NP-Complete and suppose the optimal solution has the value 5; then if any proposed approximation algorithm, that is the algorithm that solves the problem correctly in polynomial time gives the solution whose value is 7.5 then we will say that we have got 1.5 times more than the optimal one; i.e. we are doing 1.5 times worse. So, here the approximation factor = approximation ratio = 7.5/5 = 1.5. Note that in case of minimization problem, if we call OPT as value of optimal solution and V as value of approximate solution then V >= OPT always; i.e. V can be equal or worse (here more, being minimization problem) than OPT and thus approximation ratio for minimization problem is always greater than or equal to 1. Similarly for maximization problem, V <= OPT and thus approximation ratio is always less than or equal to 1.

    Thus, one comes up with an (approximation) algorithm A such that
    • It solves the problem in polynomial time
    • One can give proof of correctness for A
    • It does not degrade the quality of solution too much; i.e. one can come up with the reasonable approximation ratio for A and can prove that for the general case
    There exists many ways to come up with approximation algorithms and to prove that the have a reasonable approximation ratio. One such recipe is as follows.
    • Suppose it is the maximization problem so that the value V <= OPT where V is the value for the solution returned by any approximation algorithm. So we have to come up with a ratio k s.t. V is at most k times worse than OPT.
    • In order to prove that V is at most k times worse than OPT, we have to know what is OPT which is not possible because that itself will take non-polynomial time in general case.
    • Suppose we know some number M such that the optimal solution has maximum cost of M then OPT <= M considering all possible cases.
    • Thus, if we can prove that V is at most k times worse than M, it is guaranteed that V is at most k times worse than OPT as well. Because, given that V >= k*M and M >= OPT, we get V >= k*OPT
    • For example, if the OPT is 100, and we know that M is 120. Then suppose V is 80 (the best we can achieve with approximation algorithm) then k = V / M = 80 / 120 = 2/3 meaning we can achieve 2/3 rd of the maximum possible solution value. We note that V / OPT is 80 / 100 = 4/5 which is always more than 2/3. Thus considering the upper bound M is enough.
    • For minimization problem as we shall see below, we have to consider lower bound.
    I came across a nice lecture by Prof. Abhiram Ranade (IIT Bombay). In this lecture, in addition to introducing the basics of approximation algorithms and defining above mentioned notions, Sir also illustrates the approximation algorithms designed for two of the NP-Complete problems. One of them (my favorite and the easier one :D) is briefly explained below.
    (Note: I have tried to explain the problem and its approximation algorithm the way I understood it, so it is recommended to go through the original lecture by Ranade Sir)

    Metric TSP (Traveling Salesperson Problem)
    Input: A graph G with N cities, a matrix D giving distance between a pair of cities s.t. D is a metric.
    A distance D measure is a metric if
    1. D[i,i] = 0 for all i
    2. D[i,j] = D[j,i] for all i,j (D is Symmetric)
    3. D[i,j] <= D[i,k] + D[k,j] for all i, j and k (D follows Triangle Inequality)
    Output: A cycle in G that goes through every city exactly once and has shortest length (in terms of distance values in D)

    Recipe:
    This will follow the recipe mentioned above. This being the minimization problem, we will come up with a lower bound. For the given graph G, suppose we know the optimal tour and we remove an edge from it. As the required output is the cycle containing all vertexes in G, the edge_removed_tour is the spanning path in G (spanning = containing all vertexes). Thus this is the spanning tree of G because it contains all vertexes and no cycle. Consider the minimum spanning tree T in G which has cost M. Now as any other spanning tree in G is as expensive as T so the spanning tree that we came up with, by removing an edge from the optimal tour can have length no less than that of MST, which we called as M. So, length of edge_removed_tour >= M.
    As, length of (optimal) tour > length of edge_removed_tour (obvious), so we get
    length of (optimal) tour >= M
    Here, LHS is nothing but OPT and thus M is the lower bound on OPT.
    So, we come up with an algorithm that has solution with value V no more than twice of M i.e. V <= 2 * M.
    The algorithm and its correctness is described in the lecture.  The main idea used is triangle inequality property of D being a metric.

    Monday, January 11, 2010

    Precision-recall trade off ~ misclassification cost

    While I was reading Pattern recognition and ML by Bishop, I realized a beautiful relationship between misclassification cost and precision recall. This might be well-known of course, but it's important for me as I myself realized it. :)

    We know that precision-recall concept comes from Information retrieval kind-of a setting while misclassification cost comes from decision theory problems. In IR, given a set of true results i.e. T, and a set of retrieved results say R, precision is simply the fraction of true results in the set R. i.e. precision = #intersection(R,T) / #R. For example, suppose we know *all* the web pages that exist in the Web; then the set of pages which should be retrieved by a search engine are say T of size |T|. The search engine may retrieve a set R with size |R|. Then, if only 10 of these |R| pages are in T, then R's precision is 10 / |R|. Intuitively, R is more precise / exact if it has less not_required results. One may think of |R| >> |T| and that R contains all pages from T and many more extra results. Here, one says that R could "recall" everything in T but at the same time, it also made lots of mistakes. It is like, it took many trials on a puzzle and solved a puzzle 100 times correct while num_trials were 1000+. But in general if it could retrieve 10 pages which are in T, we say that it's recall is 10 / |T| i.e. out of |T| correct pages, it could retrieve 10. This also shows that there is a trade off between precision and recall because as num_trials increase, recall improves at the cost of precision.

    Little off-topic but relevant: It's said that "practice makes man perfect". If we consider practice being making many trials, then as num_trials increase, due to practice one gets better results.

    While I was reading the book, on page 41, I came across the misclassification cost involved in the decision theory. It says- We account for the misclassification cost because we don't want to miss on important predictions. Like in the example given for the two-class classification about whether a person is healthy or has cancer, it's important that we make no mistakes when the true class is "cancer-patient" as the consequences of these mistakes are dangerous. It also says that misclassification costs are such that we don't mind if a healthy person is predicted to be having cancer in order to not miss any patient which has cancer in reality. This is basically a trade-off between decisions. By this, we are looking for more recall even though we lose the precision.

    Thursday, December 10, 2009

    Ph.D. info

    Well, this is a collection of articles / write ups etc. which I found / got from friends while gathering information about Ph.D. Hope this will be useful for others.

    You and your research - Must read

    The Ph.D. experience - Again, a must read

    Another nice talk