Study of the effectiveness of the developed algorithms. The main principles underlying the creation of effective algorithms Efficiency than an algorithm based on

14.02.2022 Ulcer

Several different algorithms can be developed to solve the same problem. Therefore, the task of choosing the most effective algorithms arises. Note that accurately assessing the effectiveness of algorithms is a very difficult task and in each specific case requires special research.

That part of the theory of algorithms that deals with estimating the characteristics of algorithms is called metric. It can be divided into descriptive (qualitative) and metric (quantitative). The first examines algorithms from the point of view of the correspondence they establish between input data and results. The second examines algorithms from the point of view of the complexity of both the algorithms themselves and the “computations” they specify, i.e., the processes of sequential transformation of structural objects. It is important to emphasize that the complexity of algorithms and calculations can be determined in various ways, and it may turn out that with one method the algorithm A it will be more difficult IN, and with the other method it’s the other way around.

Most often, algorithms are evaluated by the required memory, number of operations performed, solution time, or computational error. These characteristics often depend on the parameters (dimensions) of the problem and are nonlinear. Therefore, in the theory of algorithms, there is a direction for assessing the effectiveness of algorithms based on asymptotic estimates of functions: required memory, computation time, etc. In this case, the most significant parameter of the function is determined and the behavior of the function is studied as the parameter values increase. In the course of the study, they are trying to determine the nature of the dependence of the values of the characteristics of the algorithm under consideration on the parameter. It can be linear (i.e. proportional to the parameter n), logarithmic (i.e. proportional to log n), quadratic (i.e. proportional to n 2), etc. By comparing the asymptotic estimates of algorithms that solve the same problem, you can choose a more efficient algorithm. They say that the value of some parameter T(n) is of order n x if there are positive constants k and n o such that for all n³n o the inequality T(n) ≤ k n x holds.

Suppose that n is the amount of numerical data received at the input of several different algorithms (A 1, A 2, A 3, A 4, A 5), which perform calculations at the same speed - 1000 operations per second, but have different asymptotic estimates. Table 1.8 shows the values of n that these algorithms can process in 1 second, 1 minute and 1 hour (values are rounded down to the nearest whole number). The data in Table 1.3 clearly shows that the performance of the algorithm (i.e., the number of data processed per unit of time) significantly depends on the nature of the asymptotic evaluation function.

Testing of the developed algorithms is usually carried out at small values of the parameter n. Such testing allows you to gain confidence in the performance of the algorithm, but does not at all guarantee the completion of the task for large values of n. We may simply not have enough computer memory or time to solve a real problem. Asymptotic estimates are important in the sense that they allow one to estimate the sufficiency of computer resources for practical calculations within known limits of variation of the parameter n.

Algorithm efficiency is a property of an algorithm that is associated with the computational resources used by the algorithm. The algorithm must be analyzed to determine the resources required by the algorithm. Algorithm efficiency can be thought of as analogous to the manufacturing productivity of repetitive or continuous processes.

To achieve maximum efficiency, we want to reduce the use of resources. However, different resources (such as time and memory) cannot be directly compared, so which of two algorithms is considered more efficient often depends on which factor is more important, such as the requirement for high speed, minimal memory usage, or another measure of efficiency.

Please note that this article NOT about algorithm optimization, which is discussed in the articles program optimization, optimizing compiler, cycle optimization, object code optimizer, and so on. The term "optimization" itself is misleading because everything that can be done falls under the umbrella of "improvement."

Background

The importance of efficiency with an emphasis on execution time was emphasized by Ada Lovelace in 1843 regarding Charles Babbage's mechanical analytical engine:

“In almost all computing, there is a large choice of configurations possible to successfully complete the process, and different conventions should influence the choice for the purpose of performing the calculation. The essential thing is to choose a configuration that will minimize the time required to perform the calculation."

Early electronic computers were very limited in both speed and memory. In some cases, it has been realized that there is a time-memory trade-off, in which a task must either use a large amount of memory to achieve high speed, or use a slower algorithm that uses a small amount of working memory. In this case, the fastest algorithm was used for which the available memory was sufficient.

Modern computers are much faster than those early computers and have much more memory (gigabytes instead of kilobytes). However, Donald Knuth emphasizes that efficiency remains an important factor:

“In established engineering disciplines, 12% improvement is easily achievable and has never been considered prohibitive, and I believe the same should be true in programming.”

Review

An algorithm is considered efficient if its resource consumption (or resource cost) is at or below some acceptable level. Roughly speaking, "acceptable" here means "the algorithm will run for a reasonable amount of time on an available computer." Because there has been a significant increase in the processing power and available memory of computers since the 1950s, the current "acceptable level" was not acceptable even 10 years ago.

Computer manufacturers periodically release new models, often more powerful ones. Price software can be quite large, so in some cases it is easier and cheaper to get better performance by purchasing a faster computer that is compatible with your existing computer.

There are many ways to measure the resources used by an algorithm. The two most used measurements are speed and memory used. Other measurements may include transfer speed, temporary disk usage, long-term disk usage, power consumption, total cost of ownership, response time to external signals, and so on. Many of these measurements depend on the size of the algorithm's input data (that is, the quantities requiring data processing). Measurements may also depend on the way in which the data is presented (for example, some sorting algorithms perform poorly on already sorted data or when the data is sorted in reverse order).

In practice, there are other factors that influence the effectiveness of the algorithm, such as the required accuracy and/or reliability. As explained below, the way an algorithm is implemented can also have a significant effect on actual performance, although many aspects of the implementation are optimization issues.

Theoretical analysis

IN theoretical analysis In algorithms, it is common practice to estimate the complexity of an algorithm in its asymptotic behavior, that is, to reflect the complexity of the algorithm as a function of the size of the input. n Big O notation is used. This estimate is generally quite accurate for large n, but may lead to incorrect conclusions at small values n(Thus, bubble sort, which is considered slow, may be faster than quick sort if you only need to sort a few elements).

Designation	Name	Examples
O(1) (\displaystyle O(1)\,)	permanent	Determining whether a number is even or odd. Using a constant size lookup table. Using a suitable hash function to select an element.
O (log ⁡ n) (\displaystyle O(\log n)\,)	logarithmic	Finding an element in a sorted array using binary search or balanced tree, similar to operations on the binomial heap.
O(n) (\displaystyle O(n)\,)	linear	Finding an element in an unsorted list or unbalanced tree (worst case). Addition of two n-bit numbers using end-to-end carry.
O (n log ⁡ n) (\displaystyle O(n\log n)\,)	quasilinear, logarithmically linear	Compute fast Fourier transform, heapsort, quicksort (best and average case), merge sort
O (n 2) (\displaystyle O(n^(2))\,)	square	Multiplying two n-digit numbers using a simple algorithm, bubble sort (worst case), Shell sort, quicksort (worst case), selection sort, insertion sort
O (c n) , c > 1 (\displaystyle O(c^(n)),\;c>1)	exponential	Finding an (exact) solution to the traveling salesman problem using dynamic programming. Determining if two logical statements are equivalent using exhaustive search

Verification Tests: Measuring Performance

For new versions of software or to provide comparison with rival systems, benchmarks are sometimes used to compare the relative performance of algorithms. If, for example, a new sorting algorithm is released, it can be compared with its predecessors to ensure that the algorithm is at least as efficient on known data as the others. Performance tests can be used by users to compare products from different manufacturers to evaluate which product will best suit their requirements in terms of functionality and performance.

Some benchmark tests provide comparative analysis of different compiling and interpreting languages, such as Roy Longbottom's PC Benchmark Collection, and The Computer Language Benchmarks Game compares the performance of implementations of typical tasks in some programming languages.

Implementation issues

Implementation issues may also affect actual performance. This includes the choice of programming language and the way in which the algorithm is actually coded, the choice of translator for the chosen language or compiler options used, and even the type of operating system. In some cases, a language implemented as an interpreter may be significantly slower than a language implemented as a compiler.

There are other factors that can affect timing or memory usage that are beyond the programmer's control. This includes data alignment, detailing, garbage collection , instruction - level parallelism and subroutine calling .

Some processors have the ability to perform vector operations, which allows one operation to process multiple operands. It may or may not be easy to use such features at the programming or compilation level. Algorithms designed for sequential computing may require complete redesign to accommodate parallel computing.

Another issue may arise with processor compatibility, where instructions may be implemented differently, so that instructions on some models may be relatively slower on other models. This can be a problem for the optimizing compiler.

Measuring Resource Usage

Measurements are usually expressed as a function of the size of the entrance n.

The two most important dimensions are:

Time: How long the algorithm takes on the CPU.
Memory: How much working memory (usually RAM) is needed for the algorithm. There are two aspects to this: the amount of memory for the code and the amount of memory for the data that the code operates on.

For battery-powered computers (such as laptops) or for very long/large calculations (such as supercomputers), a different kind of measurement is of interest:

Direct energy consumption: Energy required to run a computer.
Indirect energy consumption: Energy required for cooling, lighting, etc.

In some cases, other, less common measurements are needed:

Gear size: Bandwidth may be the limiting factor. Compression can be used to reduce the amount of data transferred. Displaying a graphic or image (such as the Google logo) can result in tens of thousands of bytes being transferred (48K in this case). Compare this to transmitting the six bytes in the word "Google".
External memory: Memory required on a disk or other external storage device. This memory can be used for temporary storage or for future use.
Response time: This setting is especially important for real-time applications where the computer must respond quickly to external events.
Total Cost of Ownership: The parameter is important when it is intended to execute a single algorithm.

Time

Theory

This type of test also significantly depends on the choice of programming language, compiler and its options, so that the compared algorithms must be implemented under the same conditions.

Memory

This section deals with the use of main memory (often RAM) needed by the algorithm. As with timing analysis above, analysis of an algorithm typically uses spatial complexity of the algorithm to estimate the required runtime memory as a function of the input size. The result is usually expressed in terms of "O" big.

There are four aspects of memory usage:

The amount of memory required to store the algorithm code.
The amount of memory required for the input data.
The amount of memory required for any output (some algorithms, such as sorts, frequently rearrange the input and do not require additional memory for the output).
The amount of memory required by the computational process during computation (this includes named variables and any stack space required for subroutine calls, which can be significant when using recursion).

Early electronic computers and home computers had relatively small working memory capacities. Thus, in 1949 the EDSAC had a maximum working memory of 1024 17-bit words, and in 1980 the Sinclair ZX80 was released with 1024 bytes of working memory.

Modern computers can have relatively large amounts of memory (perhaps gigabytes), so compressing the memory used by an algorithm into some given amount of memory is less required than before. However, the existence of three different categories of memory is significant:

Cache (often static RAM) - runs at speeds comparable to the CPU
Main physical memory (often dynamic RAM) - runs slightly slower than the CPU
Virtual memory (often on disk) - gives the illusion of huge memory, but works thousands of times slower than RAM.

An algorithm whose required memory fits into the computer's cache is much faster than an algorithm that fits into main memory, which, in turn, will be much faster than an algorithm that uses virtual space. Complicating matters is the fact that some systems have up to three levels of cache. Different systems have different amounts of these types of memory, so the memory effect on an algorithm can vary significantly from one system to another.

In the early days of electronic computing, if an algorithm and its data did not fit into main memory, it could not be used. These days, using virtual memory provides massive memory, but at the cost of performance. If the algorithm and its data fit in the cache, very high speed can be achieved, so minimizing the memory required helps minimize time. An algorithm that does not fit entirely into the cache, but provides locality of links, can work relatively quickly.

Examples of effective algorithms

Criticism of the current state of programming

Programs are becoming slower more rapidly than computers are becoming faster.

May states:

In widespread systems, halving instruction execution can double battery life, and big data provides an opportunity for better algorithms: Reducing the number of operations from N x N to N x log(N) has a strong effect for large N... For N=30 billion, these the changes are similar to 50 years of technological improvements.

Competition for the best algorithm

The following competitions invite participation in the development of the best algorithms, the quality criteria of which are determined by the judges:

Continuing the process of working backwards inductively, we obtain that n fuel tanks allow us to pass Dn kilometers, where Dn = 500(1 +1/3 + 1/5 + ... + 1/(2n - 1)).

We need to find the smallest value p, at which Dn> 1000. Simple calculations show that for n = 7 we have D?= 997.5 km, i.e. seven tanks, or 3500 liters, of fuel will allow you to travel

977.5 km. A full eighth tank - this would be more than necessary to transport 3500 liters from the point A to a point located at
22.5 km (1000 - 977.5) from A The reader is given the opportunity to independently verify that 337.5 liters are sufficient to deliver 3500 liters of fuel to the 22.5 km mark. Thus, in order to cross the desert from I to C by car, you need 3837.5 liters of fuel.

Now the fuel transportation algorithm can be presented as follows. We start from A, having 3837.5 liters. There is just enough fuel here to gradually transport 3500 liters to the mark

22.5 km, where the jeep will eventually end up with an empty tank and enough fuel for 7 full refills. This fuel is enough to transport 3000 liters to a point 22.5 + 500/13 km from A, where the car's tank will be empty again. Subsequent transportation will bring the jeep to a point located 22.5 + 500/13 + 500/11 km from A, with an empty tank of the car and 2500 l in the warehouse.

Continuing in this way, we move forward thanks to the analysis carried out by working backwards. Soon the jeep will be at the 500(1 - 1/3) km mark with 1000 liters of fuel. Then we will transport 500 liters of fuel to the point IN, Let's pour them into the tank of the car and drive to the point without stopping WITH(Fig. 2.17).

Rice. 2.17.

For those familiar with infinite series, note that D There is n-th partial sum of an odd harmonic series. Since this series diverges, the algorithm makes it possible to cross any desert. Try modifying this algorithm to leave enough fuel at various points in the desert to return to the point A.

The question arises whether it is possible to travel 1000 km using less than 3837.5 liters of fuel. It turns out you can't. The proof of this statement is quite complicated. However, the following, rather plausible, argument can be made. Obviously we are acting in the best possible way For To= 1. When To= 2 plan used for To= 1 and then the second tank of fuel is activated in order to be as far away from the point as possible IN. The starting premise for To tanks is that we know how to act best in the case of (To - 1) tanks, and move back as far as possible with the help Who tank

So, in the problem considered, the method of working backwards is that the problem is solved as if from the end; the method of partial goals is that they do not solve the whole problem at once, but, as it were, in parts; and, finally, the method of ascent is manifested in the fact that the solution is not found immediately, but sequentially, as if approaching it.