Qbasicnews.com
April 12, 2021, 07:45:22 PM
 Pages: [1] 2 3 4
 Author Topic: Sorting Arrays  (Read 18124 times)
Dean
Member

Posts: 67

 « on: April 02, 2005, 09:01:24 PM »

I'm trying to come up with a fast sorting routine to sort
string arrays.  I tried a quicksort and bubble sort (which
I admittedly borrowed from some posted source I found),
but I can't come close to the speed of the ARRAY SORT
function built into PB.  I'm trying to stay away from PB,
but "just when I thought I was out, they pull me back in..."

I could use a hand if anyone can offer some FB code that will
perform as fast as PB's built in function, or at least come close.
In this PB example, on my PIII 1GHZ laptop, it takes
15.5 seconds to sort a 3 million element 7 byte string array.

#COMPILE EXE
#DIM ALL

FUNCTION PBMAIN () AS LONG

DIM X AS LONG
DIM Y AS LONG
DIM Z AS LONG
DIM S AS STRING
DIM TI1 AS DOUBLE
DIM TI2 AS DOUBLE

X = 3000000

DIM MyArray(X) AS STRING * 7

FOR Z = X TO 1 STEP - 1
S = LTRIM\$(STR\$(Z))
IF LEN(S) < 7 THEN S = S + STRING\$(7 - LEN(S), " ")
Y = Y + 1
MyArray(Y) = S
NEXT Z

PRINT "Started to sort array in ascending order..."
TI1 = TIMER
ARRAY SORT MyArray()    : 'PB built in sorting function
TI2 = TIMER

PRINT "Time to sort array = ";TI2 - TI1; " seconds."

END FUNCTION

Dean
 Logged
Moneo
Na_th_an

Posts: 1971

 « Reply #1 on: April 03, 2005, 11:07:45 PM »

Dean,

You are really making things difficult for the subject sort.

1) 3 million strings of 7 bytes each. That's 21 million bytes. I don't know the limits of FreeBasic for handling that capacity in memory, but even with memory paging or swapping, it's going to run slow.

2) Next, you generate your input in descending order. For most sort algorithms, that's the worst case for speed.

3) No sort algorithm that I have ever seen in 40 years is going to solve your problem. The volume of strings (or records) that you have will slow down any sort.

4) Your fundamental problem is sorting these 7 million strings (or records) in memory. I suggest the following:-

* Do the sorting "off-line" from your program. That is, put or generate the strings onto a flat, text file.

* Do a test using the old MSDOS SORT command from the command line. Run SORT /? for help. In your case, the command would be:
SORT INPFILE /O OUTFILE

* Then process the sorted OUTFILE as required by your program.

* If SORT does not satisfy you need for speed, then go to:
WWW.OPTTECH.COM and buy the best sort program I have ever seen. The DOS version costs \$149. You won't find anything better. The OptTech people have been exclusively providing sorts for over 20 years.

Good luck. Let us know how it went.
*****
 Logged
Wandering Guru

Posts: 389

 « Reply #2 on: April 04, 2005, 12:12:00 AM »

3 million damn at that size i can't see normal array sorting being able to do it that just way to many recoreds to sort.

a LinkList soluation might be better off for what your trying to do since you can easy delete , add , append data togather the only down side is you have to run through the whole link list to find to search the list.

also FB should be able to handle 3 million elements at 8byte per element it would only be 22 to 23 megs i think.
 Logged
Antoni Gual
Na_th_an

Posts: 1434

 « Reply #3 on: April 04, 2005, 11:15:29 AM »

A simple recursive Quicksort compiled in FB 0.13 took 3,3 seconds in my PC (P4 2.4Mhz) to order the 3M elements.

Code:

Sub Quicksort(min,max)
dim p1,p2,midd as string*7
IF min < max THEN
p1 = min
p2 = max
midd = Myarray((min + max) \ 2)
DO UNTIL p1 > p2
DO WHILE Myarray(p1) < midd
p1 = p1 + 1
LOOP
DO WHILE midd < Myarray(p2)
p2 = p2 - 1
LOOP
IF p1 <= p2 THEN
SWAP Myarray(p1), Myarray(p2)
p1 = p1 + 1
p2 = p2 - 1
END IF
LOOP
IF min < p2 THEN Quicksort min, p2
IF p1 < max THEN Quicksort p1, max
END IF
end sub

It's said recursion is slow..I have somewhere a nonrecursive quicksort that should be faster.
I have no PBCC to compare.
 Logged

Antoni
Dean
Member

Posts: 67

 « Reply #4 on: April 04, 2005, 12:21:43 PM »

The example I posted was intended to test a worst case scenario
just to get some benchmarks.

As for memory, since FB can take in 2gb, that's not an issue.  And
it did work fine -- just that it took twice as long as the built in
PB function.  Actually, using the same quicksort in PB without
using their built in function, FB was over 50% faster than PB.
But their built in funciton was twice as fast as FB doing a quicksort.
And I did try a nonrecursive quicksort, which was really slow, so
I think recursion is helping speed.

I was trying to see if someone had an algorithm that performed
as well as the PB function just so I wasn't led into PB temptation.
I want FB to deliver me from evil!  As Antoni pointed out, it's not
too slow using a quicksort in FB, it's just not as fast as that ARRAY
SORT function in PB.  I'll try Antoni's code to see if its faster than
the quicksort I was using and let you know the results.

I can live with using a quicksort in FB -- just hoped that maybe
there's an algo that I don't know about that is faster.

Thanks for the feedback!

Dean
 Logged
Antoni Gual
Na_th_an

Posts: 1434

 « Reply #5 on: April 04, 2005, 12:49:01 PM »

Quicksort is actually one of the fastest algorithms around. What is ARRAY SORT using?
 Logged

Antoni
na_th_an
*/-\*

Posts: 8244

 « Reply #6 on: April 04, 2005, 01:19:07 PM »

Chances are that PBCC SORT routine is coded in assembly. Look for an iterative quicksort routine written in assembly.
 Logged

SCUMM (the band) on Myspace!
ComputerEmuzone Games Studio
underBASIC, homegrown musicians
[img]http://www.ojodepez-fanzine.net/almacen/yoghourtslover.png[/i
Dean
Member

Posts: 67

 « Reply #7 on: April 04, 2005, 01:30:37 PM »

Don't know what they're doing -- it's a built-in function.
Probably done in assembly.  I heard of algo's that
are supposedly faster than quicksort - LINEAR TIME.
I'm having a conultant potentially write me a .dll that
I can use which should be 3x faster than a quicksort.

Dean
 Logged
Antoni Gual
Na_th_an

Posts: 1434

 « Reply #8 on: April 04, 2005, 01:53:38 PM »

Quote

There are sorting algorithms that run faster than O(n lg n) time but they require special assumptions about the input sequence to be sort. Examples of sorting algorithms that run in linear time are counting sort, radix sort and bucket sort. Counting sort and radix sort assume that the input consists of integers in a small range. Whereas, bucket sort assumes that the input is generated by a random process that distributes elements uniformly over the interval.

Since we already know that the best comparison-based sorting can to is Ω(n lg n). It is not difficult to figure out that linear-time sorting algorithms use operations other than comparisons to determine the sorted order.

Despite of linear time usually these algorithms are not very desirable from practical point of view. Firstly, the efficiency of linear-time algorithms depend on the keys randomly ordered. If this condition is not satisfied, the result is the degrading in performance. Secondly, these algorithms require extra space proportional to the size of the array being sorted, so if we are dealing with large file, extra array becomes a real liability. Thirdly, the "inner-loop" of these algorithms contain quite a few instructions, so even though they are linear, they would not be as faster than quick sort (say).
 Logged

Antoni
Dean
Member

Posts: 67

 « Reply #9 on: April 04, 2005, 02:04:07 PM »

There's a sort utility I use to work on files called PSORT, which
is incredibly fast, and it is based on a linear time algorithm. He calls
it a "postman's sort".  Some analogy to sorting mail.

I suppose I could also just compile a PB .dll which calls their
built in sort function, then link FB to that .dll.  Hmm...
 Logged
Dr_Davenstein
Na_th_an

Posts: 2052

 « Reply #10 on: April 04, 2005, 06:18:43 PM »

DirectQB came with a built in sorting routine... Maybe Lilo could offer some advice on porting it to 32-bit asm.
 Logged
Dean
Member

Posts: 67

 « Reply #11 on: April 04, 2005, 06:42:35 PM »

What's DirectQB?  Is it open source?
 Logged
Dr_Davenstein
Na_th_an

Posts: 2052

 « Reply #12 on: April 04, 2005, 06:51:15 PM »

It was one of the most popular QB libraries. The asm files were included in the package... Pretty much anything that was for QB was open source.
 Logged
lillo
Guru

Posts: 269

 « Reply #13 on: April 04, 2005, 08:02:14 PM »

Naah, IIRC the DQB array sorting routine used bubble sort, which is much slower than quick sort.
Better to recode a quick sort routine from scratch using the inline asm  FB provides, if the array sorting is the real bottleneck of your app.
A hint to speed things up: don't sort the strings. Sort pointers to the string. I mean, have a complementary array of pointers to each string of your original array, then do the sort on this. This way when two elements are to be swapped, you just swap 4 bytes with a single MOV, instead of copying the whole string.
 Logged

ngelo Mottola - EC++
v3cz0r
I hold this place together

Posts: 924

 « Reply #14 on: April 04, 2005, 10:12:05 PM »

Using a pointers table as Angelo suggested and zstrings the result was 8.2 secs on my 1.8 athlon, now using the GDSL library + strcmp(), i got 5.2 secs (GDSL is a static lib, so the exe was 17kb, no DLL's are needed), here's the test:

Code:
#include once "gdsl/gdsl.bi"
#include once "crt.bi"

declare sub quicksort(byval min,byval max)
declare sub fillarray(byval min,byval max)

const elements = 3000000

dim shared myarray(0 to elements-1) as zstring * 8
dim shared ptrarray(0 to elements-1) as zstring ptr

'' using quicksort in FB
fillarray(0, elements-1)
t# = timer
quicksort(0, elements-1)
print timer - t#

'' using gdsl + strcmp
fillarray(0, elements-1)
t# = timer
gdsl_sort( @ptrarray(0), elements, @strcmp )
print timer - t#

'':::::
sub fillarray(byval min,byval max)
dim i as integer
dim s as string

for i = max to min step -1
s = str\$(i)
if len(s) < 7 then
s += string\$(7 - len(s), " ")
end if
myarray(max-i) = s
ptrarray(i) = @myarray(i)
next z
end sub

'':::::
sub quicksort(byval min,byval max)
dim p1,p2,midd as zstring*8

if min < max then
p1 = min
p2 = max
midd = *ptrarray((min + max) \ 2)
do until p1 > p2
do while *ptrarray(p1) < midd
p1 += 1
loop
do while midd < *ptrarray(p2)
p2 -= 1
loop
if p1 <= p2 then
swap ptrarray(p1), ptrarray(p2)
p1 += 1
p2 -= 1
end if
loop
if min < p2 then quicksort min, p2
if p1 < max then quicksort p1, max
end if

end sub
 Logged

 Pages: [1] 2 3 4