dimaag kehta hain.. :-): 2010

Friday, September 24, 2010

mplayer - not playing files with _space_ characters

Recently I upgraded all packages on Ubuntu and somehow since then I could not play video files with vlcplayer! Not just that, somehow Realplayer and default movie player lost the codecs they used to previously use! After this miracle, the only option left with me was to try mplayer. I had heard that it supports many common formats and so on. I tried that and faced another issue. It magically could not play files having space characters in it!!

Initially I used to manually rename the file from browser before double clicking it for mplayer. Later on I found this very hectic when following happened. I renamed a long file name replacing some 7-8 spaces by underscores (Note: spaces, it converts to %20 and cannot play such file, underscore works!) and soon realized that I wanted to see another episode (with similar long file name)!!! So, I got irritated a lot and said to my friend that I will soon write a script to rename all files such the spaces are replaced by underscores. Now, I realized that all main / important video files (movies, acad lectures and so on) were on portable hard disk and the list will get augmented day after day as I get something good from some source, I will have a copy. Thus, this method sounded a bit overkill - to rename all file names just like that!

I then did following.

I wrote following lines in a file ~/scripts/mplayer_helper.sh

#!/bin/bash
fname20=`echo $1 | sed 's/ /\\ /g'`
fname=`echo $1 | sed 's/ /_/g'`
mv "$fname20" "$fname"
mplayer $fname
The idea here is simple. Just create a file name (do some trials on command line first as to how exactly "mv" expects the input.) replacing space by underscore and replace file using "mv" command. Once that is done, invoke mplayer on renamed file.
Then I made it executable using

chmod +x ~/scripts/mplayer_helper.sh
Then I created a soft link to this executable inside /usr/bin/ as follows:

cd /usr/bin
sudo ln -s ~/scripts/mplayer_helper.h sudomplayer
Now, when I want to open video from browser, I simply right-click it and say "Open with". Then, I expand the option "Use custom command" and write "/usr/bin/sudomplayer" and say "Open".
This not only opens the video by run-time substituting spaces with underscores but also once opened, adds this sudo application into the list of applications. Additionally, sudomplayer will be the default application shown for that file format then onwards.

Easy! Isn't it!

But, there is a glitch, what if not only the file to be played but also the recursive folders it is contained in have spaces. mplayer fails then as well! But, the solution is simple! Modify the script to take care of that as follows.

#!/bin/bash
xmessage -center -nearmouse Note that this tries to rename the files so this might fail if the file path does not have write permissions
i=2
token=`echo $1 | cut -d"/" -f$i`
more=`echo $1 | cut -d"/" -f$i | grep -i ^$ | wc -l`
currDir=""
while [[ $more -eq 0 ]]
do
        currDir=$currDir/
        token=`echo $1 | cut -d"/" -f$i`
        name20=`echo $token | sed 's/ /\\ /g'`
        dir20=$currDir$name20
        name=`echo $token | sed 's/ /_/g'`
        dir=$currDir$name
        if [[ $dir20 != $dir ]]
        then
                if [[ -a $dir ]]
                then
                        xmessage -center -nearmouse $dir File exists! Aborting!!
                exit
                fi
                mv "$dir20" "$dir"
        fi
        currDir=$dir
        i=`expr $i + 1`
        more=`echo $1 | cut -d"/" -f$i | grep ^$ | wc -l`
done
mplayer $currDir

The idea is here, to go one directory at a time. Try to rename it and then keep track of current directory such that now next level directory can be renamed. But, WARNING!! What if the current directory is inside "/home/xyz/abc/"? One might not have permission to replace "xyz" with "xyz" inside /home/. There are two ways to look at it. First, mostly the filenames with space character won't appear in such so_called __system_prohibited__ paths and even if they appear, it's better not to mess with it (at least automatically!) and thus, the initial "xmessage" serves the purpose. Another WARNING! is - what if we are trying to rename directory "dir with spaces" and there already exists a directory called "dir_with_spaces". To prevent such collisions, we will terminate the script once we find such case. Work around to this problem is to use some weird set of characters instead of "_", say like "_-_" or so; but, then if this script is run quite frequently, then your filesystem (portable hdd in my case) will be full of such weird names! ;)

Anyways, another (probably best?) way is - just Google around! Then will tell you some ways to get around to this problem of mplayer not playing files with space characters. But, then you will miss all the scripting_fun! :D :P

Saturday, May 29, 2010

Rhythmbox Music player - password window popping issue!

I have been using Rhythmbox music player since almost 3 years now. Suddenly one day, it started popping a Give_ur_password type of window at every track. I used to close the box, then it would pop up again for thrice or so. It looked like the following.

I tried Google-ing of course, but could not find anything. There were some posts regarding why / how some plug-ins for Rhythmbox keep popping windows. I did not find anything related to password window specifically. Still, I opened Edit->Plug-ins and observed that few plug-ins were enabled. For no apparent reason, I disabled the plug-in for Cover art (as shown below) and never got those disturbing pop up windows again!

What I don't understand though is, that plug in was always enabled afaik, because I never explicitly enabled / disabled it in past, then why suddenly after almost 2.5 years, it started popping up windows?! Another thing is, as shown in first image, there is something called "xml.amazon-proxy caching server" - Have no clue how it could be related to cover art of an album!

Anyways, will definitely post the explanation if I can find one, for time being let me enjoy the pop-up_window_free Rhythmbox!

Thursday, April 15, 2010

Approximate if you can do it in polytime!

The main idea behind approximation algorithms is as follows. When one has to solve a problem which is proved to be NP-Complete, one has two choices.

Use usual algorithm and get answer in non-polynomial time (most of the times exponential) and be ready to wait until eternity OR
Compromise on the quality of answer but get things done faster

As suggested by 2 above, approximation algorithms have the same aim. Instead of finding the correct, optimal, best possible solution to the given problem, they try to find an approximated one; but they guarantee to find it in polynomial time so that they run faster. Now the question is - even if they are faster, how much does the quality of solution degrades? This is measured by Approximation factor or Approximation ratio. This factor or ratio basically captures how far we are from doing the best i.e. from the optimal. Consider the optimization in terms of minimization. Suppose a minimization problem is NP-Complete and suppose the optimal solution has the value 5; then if any proposed approximation algorithm, that is the algorithm that solves the problem correctly in polynomial time gives the solution whose value is 7.5 then we will say that we have got 1.5 times more than the optimal one; i.e. we are doing 1.5 times worse. So, here the approximation factor = approximation ratio = 7.5/5 = 1.5. Note that in case of minimization problem, if we call OPT as value of optimal solution and V as value of approximate solution then V >= OPT always; i.e. V can be equal or worse (here more, being minimization problem) than OPT and thus approximation ratio for minimization problem is always greater than or equal to 1. Similarly for maximization problem, V <= OPT and thus approximation ratio is always less than or equal to 1.

Thus, one comes up with an (approximation) algorithm A such that

It solves the problem in polynomial time
One can give proof of correctness for A
It does not degrade the quality of solution too much; i.e. one can come up with the reasonable approximation ratio for A and can prove that for the general case

There exists many ways to come up with approximation algorithms and to prove that the have a reasonable approximation ratio. One such recipe is as follows.

Suppose it is the maximization problem so that the value V <= OPT where V is the value for the solution returned by any approximation algorithm. So we have to come up with a ratio k s.t. V is at most k times worse than OPT.
In order to prove that V is at most k times worse than OPT, we have to know what is OPT which is not possible because that itself will take non-polynomial time in general case.
Suppose we know some number M such that the optimal solution has maximum cost of M then OPT <= M considering all possible cases.
Thus, if we can prove that V is at most k times worse than M, it is guaranteed that V is at most k times worse than OPT as well. Because, given that V >= k*M and M >= OPT, we get V >= k*OPT
For example, if the OPT is 100, and we know that M is 120. Then suppose V is 80 (the best we can achieve with approximation algorithm) then k = V / M = 80 / 120 = 2/3 meaning we can achieve 2/3 rd of the maximum possible solution value. We note that V / OPT is 80 / 100 = 4/5 which is always more than 2/3. Thus considering the upper bound M is enough.
For minimization problem as we shall see below, we have to consider lower bound.

I came across a nice lecture by Prof. Abhiram Ranade (IIT Bombay). In this lecture, in addition to introducing the basics of approximation algorithms and defining above mentioned notions, Sir also illustrates the approximation algorithms designed for two of the NP-Complete problems. One of them (my favorite and the easier one :D) is briefly explained below.

(Note: I have tried to explain the problem and its approximation algorithm the way I understood it, so it is recommended to go through the original lecture by Ranade Sir)

Metric TSP (Traveling Salesperson Problem)

Input: A graph G with N cities, a matrix D giving distance between a pair of cities s.t. D is a metric.

A distance D measure is a metric if

D[i,i] = 0 for all i
D[i,j] = D[j,i] for all i,j (D is Symmetric)
D[i,j] <= D[i,k] + D[k,j] for all i, j and k (D follows Triangle Inequality)

Output: A cycle in G that goes through every city exactly once and has shortest length (in terms of distance values in D)

Recipe:

This will follow the recipe mentioned above. This being the minimization problem, we will come up with a lower bound. For the given graph G, suppose we know the optimal tour and we remove an edge from it. As the required output is the cycle containing all vertexes in G, the edge_removed_tour is the spanning path in G (spanning = containing all vertexes). Thus this is the spanning tree of G because it contains all vertexes and no cycle. Consider the minimum spanning tree T in G which has cost M. Now as any other spanning tree in G is as expensive as T so the spanning tree that we came up with, by removing an edge from the optimal tour can have length no less than that of MST, which we called as M. So, length of edge_removed_tour >= M.

As, length of (optimal) tour > length of edge_removed_tour (obvious), so we get

length of (optimal) tour >= M
Here, LHS is nothing but OPT and thus M is the lower bound on OPT.

So, we come up with an algorithm that has solution with value V no more than twice of M i.e. V <= 2 * M.

The algorithm and its correctness is described in the lecture. The main idea used is triangle inequality property of D being a metric.

Monday, January 11, 2010

Precision-recall trade off ~ misclassification cost

While I was reading Pattern recognition and ML by Bishop, I realized a beautiful relationship between misclassification cost and precision recall. This might be well-known of course, but it's important for me as I myself realized it. :)

We know that precision-recall concept comes from Information retrieval kind-of a setting while misclassification cost comes from decision theory problems. In IR, given a set of true results i.e. T, and a set of retrieved results say R, precision is simply the fraction of true results in the set R. i.e. precision = #intersection(R,T) / #R. For example, suppose we know *all* the web pages that exist in the Web; then the set of pages which should be retrieved by a search engine are say T of size |T|. The search engine may retrieve a set R with size |R|. Then, if only 10 of these |R| pages are in T, then R's precision is 10 / |R|. Intuitively, R is more precise / exact if it has less not_required results. One may think of |R| >> |T| and that R contains all pages from T and many more extra results. Here, one says that R could "recall" everything in T but at the same time, it also made lots of mistakes. It is like, it took many trials on a puzzle and solved a puzzle 100 times correct while num_trials were 1000+. But in general if it could retrieve 10 pages which are in T, we say that it's recall is 10 / |T| i.e. out of |T| correct pages, it could retrieve 10. This also shows that there is a trade off between precision and recall because as num_trials increase, recall improves at the cost of precision.

Little off-topic but relevant: It's said that "practice makes man perfect". If we consider practice being making many trials, then as num_trials increase, due to practice one gets better results.

While I was reading the book, on page 41, I came across the misclassification cost involved in the decision theory. It says- We account for the misclassification cost because we don't want to miss on important predictions. Like in the example given for the two-class classification about whether a person is healthy or has cancer, it's important that we make no mistakes when the true class is "cancer-patient" as the consequences of these mistakes are dangerous. It also says that misclassification costs are such that we don't mind if a healthy person is predicted to be having cancer in order to not miss any patient which has cancer in reality. This is basically a trade-off between decisions. By this, we are looking for more recall even though we lose the precision.