Tuesday, January 8, 2008

Role of Algorithms in Computing

Knuth in his book says that the word algorithms may have been borrowed from the name of 9th century Persian Mathematician, Muhammad Ibn Musa Al-Khwarizmi.

Algorithms have been used to great extent in the human genome project to find the 1,00,000 pairs that make up the human DNA. This includes algorithms like String Matching to detect patterns etc. Algorithms are used on the Internet to find a good path for the data to travel on. Everyday on the Internet, we send a lot of information such as Bank Account data, Credit card numbers, passwords. Well, you need to protect this data from theft, otherwise someone might steal this data and misuse it, impersonate you. Thus, the science of Cryptography and digital signatures uses many algorithms. Genetic algorithms often imitate the biological processes such as mutation, reproduction etc.

To save network bandwidth and storage space, data is compressed. The science of Data Compression uses Compression algorithms. In the field of Digital Image Processing, algorithms are used to enhance images, to extract characteristic features of images etc.

So, what's this subject about ?
Here we study, different ways to solve a problem. It is the theoretical study of computer programs, their performance and resource usage.

What things are more important than Performance of a program?
  • User friendliness
  • Reliability
  • Maintainability
  • Security
  • Functionalities and features
  • Cost

Why study the performance of Algorithms ?
From what I have understood is the following -
  • If computers were infinitely fast and memory was free any correct algorithm would do.
  • But, computer speeds are fast, but not infinite. Similarly, computer memory is a bounded resource. Hence, we need to the performance algorithms.
  • Basically, we are concerned with how fast things run!
  • As rightly pointed out in the video lectures, performance is the demarcating line between the feasible and the infeasible. Well, no matter how good your program is, if it doesn't run, it just doesn't work out for you.
  • In the world of computer programs, the Barter system is still being used. You can use performance to pay for all the other things like User-friendliness, Security etc. So, performance is like currency or money. If a user wants good, user friendly interface you write a program in Java. Although Java is much slower than C, Java has this rich set of Object Oriented features, so that the user-interface is easy to use. Thus, you spend by a factor of 2 or 3 on performance, to get an increase in user-friendliness.
Insertion Sort Algorithm

The insertion sort algorithm is given as follows :
void main(){
int A[] = {5,2,4,6,1,3};
int length = 6;
int i,j,key;

/*Insertion Sort*/
for(j=1;j=0 && A[i]>key){
A[i+1] = A[i];
i = i - 1;
}
A[i+1] = key;
}

for(i=0;i

Sorting problem is one of the oldest problems in Mathematics. The above algorithm works as follows. At any point j in the algorithm, all the elements A[1...j-1] are always sorted. This is the loop invariant. Well, so each time, in every loop, we add one element A[j] to the stuff that's already sorted A[1...j-1]. And how do we do that ? Find an appropriate place for A[j]. We simply keep copying the elements, till we find the position for A[j].

Running time of an Algorithm
Running time of an algorithm depends upon the following two things :
  • Depends upon the Input - (Sorted vs. reverse-sorted)
  • Depends upon the Input Size n - Sorting 6 elements vs. a Billion elements. The more stuff you have, the more time it takes.
Running time in general depends upon the input size. So, we express the running time T(n) as a function of the input size n...

As far as Running time is concerned, we generally want Upper Bounds on the running time. We would like to know, what's the maximum running time of an algorithm, the maximum it could possibly take... The reason is simple, it presents a guarantee to the user. So, we are usually interested in an upper bound on the Running Time, because everybody likes a Guarantee.

Well, we do different kinds of Analysis on computer programs :
  • Worst Case (What we do usually) -
    T(n) = Maximum time to run the algorithm on any input of size n.
  • Average Case(We do it sometimes...)-
    Average running time T(n) =
    Average (Running time of an input)x(Probability of that Input)

    So, it is the expected running time over all inputs of size n. But, how do we know the probability of an input. We don't !! So, we make an assumption... And that assumption is called a statistical distribution!
  • Worst Case(Bogus) - You can cheat. Take some input for which the algorithm runs fast...
What's Insertion Sort's worst case time ?
  • Obviously, it depends upon the computer. Are you running insertion sort on a big supercomputer or on your mobile phone or PDA? They have different computational capabilities...
How do we compare two algorithms ?? Two options..
  • Relative Speed - We can run two algorithms on the same machine, and talk about their relative speed.
  • Absolute Speed - It doesn't matter what computer the algorithms are run, we must be able to compare them in terms of absolute speed!
So, we pull out a BIG IDEA from our Math background, and that BIG IDEA is called Asymptotic Notation.

Asymptotic Notation -
Simple and easy to use!
Let's see the theta Notation...












































Worst case of Insertion Sort








No comments: