This new algorithm for sorting books or files is close to perfection

The original version of This story appeared in Quanta Magazine.

Computer scientists often deal with abstract problems that are difficult to understand, but an exciting new algorithm is important to anyone who owns books and at least one shelf. The algorithm deals with something called the library sorting problem (more formally, the problem with labeling). The challenge is to devise a strategy for organizing books in a kind of sorted sequence – for example, alphabetically – that limits the minimum how long it takes to put a new book on the shelf.

For example, imagine keeping your books together and leaving empty space to the far right of the shelf. If you then add a book from Isabel Allende to your collection, you may need to move each book on the shelf to make room for it. It would be a time -consuming operation. And if you get a book from Douglas Adams, you will have to do it again. A better arrangement will spread uninhabited spaces all over the shelf – but how should it be spread exactly?

This problem was launched in a 1981 newspaper, and it goes beyond giving librarians organizational guidance. This is because the problem also applies to the arrangement of files on hard drives and in databases, where the items to be arranged can number in the billions. An ineffective system means significant waiting times and large calculation expenses. Researchers invented some effective methods to store items, but they wanted to determine the best way for a long time.

Last year, in a study presented during the Foundations of Computer Science Conference in Chicago, a team of seven researchers described a way to organize items that come close to the theoretical ideal. The new approach combines a little knowledge of the content of the bookshelf with the surprising power of randomness.

“This is a very important problem,” says Seth Pettie, a computer scientist at the University of Michigan, because many of the data structures we trust today store consecutively information. He calls the new job “extremely inspired [and] Easily one of my top three favorite papers of the year. “

Narrowing boundaries

So how does one measure a well -sorted bookshelf? A common way is to see how long it takes to insert an individual item. It, of course, depends on how many items there are in the first place, a value typically indicated by nook. In the Isabel Allende example, when all the books have to move to accommodate a new one, the time it takes is evenly to nook. The larger the nookthe longer it takes. This makes it a ‘upper bound’ of the problem: it will never last longer than a time evenly on nook To add one book on the shelf.

The writers of the paper of 1981 who led this problem nook. And they have indeed proven that one can do better. They created an algorithm that would be guaranteed to achieve an average insertion time (log nook)². This algorithm had two characteristics: it was ‘deterministic’, which means that the decisions do not depend on any randomness, and it was also ‘smooth’, which means that the books should be evenly distributed within subsections of the shelf where insertions ( or deletion) is made. The authors opened the question of whether the upper boundary improved even further. No one has managed to do so for more than four decades.

However, the intervening years saw improvements to the lower boundary. While the upper boundary specifies the maximum possible time needed to insert a book, the lower boundary gives the fastest possible insertion time. To find a definite solution to a problem, researchers strive to limit the gap between the upper and lower boundaries, ideally until it coincides. When this happens, the algorithm is considered optimal – which is ominously bounded from above and below, leaving no room for further refinement.

dsv news

This new algorithm for sorting books or files is close to perfection

Narrowing boundaries

Leave a Reply Cancel reply