Hey thaddeus, the main issue I run into with something like this is that sessions are very resource-intensive on the server side. As it stands I get to reclaim session resources right when people leave, and still Try Arc gets overloaded and requires manual restart about once every two weeks.
Ah, but the solution seems so straightforward now: just put sessions to sleep. Every session can be stored efficiently as a list of expressions, and then re-awakened by running the expressions through a new REPL before showing it to the user.
If the REPL session takes any input, including by measuring timings or generating random numbers, it won't be deterministically recreated by a list of commands. And what if I want a fresh REPL?
When I go to Try Arc, I imagine there's a single REPL per window/tab, which goes away as soon as I leave the page. So I think it would be cool to be able to copy out transcripts (without spurious hyphens showing up :-p ) and load/paste Arc libraries, but that's all the power I hope for.
If you're looking for more of a server-side challenge (and more of a productivity site rather than a tutorial :-p ), it would be awesome to have a user account system that lets me paste a bunch of Arc utilities and load them automatically every time I start a REPL.
I'm still a bit suspicious about abstraction leaks in the concept of "save this session." I'd prefer to think in terms of a library pastebin, where I may choose to turn my REPL session into a library but I'm not misled into thinking it'll preserve more about my session than the raw code I entered.
For what it's worth, managing user logins and passwords is a very easy business with WordPress! The challenge would be figuring out how to interface those WordPress user accounts with REPL functionality.
If implemented like HN x-id's you could just refresh or reload the page in order to get a new session. And like HN one could just expire a stored session after x minutes without being used.
Note though: Originally when I posted the session-id idea I was thinking tryarc was only one session with many threads, where each session-id would really just point to its own namespace. Had this been the case it would just be the functions and variables being stored in memory. And then threads/session-ids/namespaces (whatever) could be spun up or down and managed in a myriad of ways.
You: "If implemented like HN x-id's you could just refresh or reload the page in order to get a new session."
That's the behavior I expect, but then I also expect to get a new session if I navigate somewhere and click the back button. If the back button case does not give me a new REPL, I don't see why refreshing the page would.
I was thinking it would be interesting to investigate the different approaches and see if being idiomatic would also be better performing (not that i am sure what defines idiomatic with arc).
For testing I am only using the basic case (i.e no accounting for applying functions to items as they are sorted). These following examples are only a preliminary run and intended only to start looking at comparable approaches. One thing I'm noticing right off the bat, is that Method 1 & 2 has code that does already account for applying functions, therefore could be potentially sped up..? Are there other things I should think about here? If the goal were to create the fastest version then are there other ideas that anybody else is willing to pitch in? Or if the goal were to create the most succinct code are there other ideas here too? And, in general, is there anything I am not giving consideration for?
Thanks.
; random data - ugly code / quickly hacked
; --------------------------------------------------------------------
(def jumble (seq)
(let tb (table)
(each i seq
(withs (k (rand 50) v (tb k))
(fill-table tb (list k (cons i v)))))
(flat (vals tb))))
(= data
(jumble
(let xs nil
(repeat 100
(push (with (n (rand 100) x (rand 100))
(n-of n x)) xs))
(flat xs))))
; Method #1 -> succinct, performs very well + stable sort
; --------------------------------------------------------------------
(def freq (l (o f idfn))
(ret ans (table)
(each x l
(++ (ans f.x 0)))))
(def sort-by-commonest1 (l (o f idfn))
(let h (freq l f)
(sort (compare > h:f) l)))
arc> (time10 (do (sort-by-commonest1 data) nil))
time: 2399 msec.
; Method #2 -> very succinct, performs well + stable sort
; --------------------------------------------------------------------
(def sort-by-commonest2 (l (o f idfn))
(let h (counts:map f l)
(sort (compare > h:f) l)))
arc> (time10 (do (sort-by-commonest2 data) nil))
time: 2436 msec.
; Method #3 -> not so succinct , terrible/terrible performance + unstable sort
; --------------------------------------------------------------------
(def sort-by-commonest3 (xs)
((afn (rx rs)
(if (empty rx) rs
(let (x n)(commonest rx)
(self (rem x rx)(join rs (n-of n x))))))
xs nil))
arc> (time10 (do (sort-by-commonest3 data) nil))
time: 8052 msec.
; Method #4 -> not so succinct, very fast + stable sort
; --------------------------------------------------------------------
(def sort-by-commonest4 (seq)
(flat
(sort (fn (x y)(> (len x) (len y)))
((afn (xs ys)
(if (empty xs) ys
(withs (x (car xs)
f (testify [isnt _ x])
r (reclist [if (f:car _) _] xs)
c1 (len xs)
c2 (len r))
(self r (cons (firstn (- c1 c2) xs) ys)))))
(sort > seq) nil))))
arc> (time10 (do (sort-by-commonest4 data) nil))
time: 1257 msec.
I love the dot syntax, but Clojure doesn't have any of the intra-symbol syntax's that arc has and I really struggle getting back into using them :)... even [ _ ] is foreign to me now, but just for you ^_^:
(def sort-by-commonest5 (seq)
(mappend [do _]
(sort (fn (x y)(> len.x len.y))
((afn (xs cxs ys)
(if empty.xs ys
(withs (x car.xs
f (testify [isnt _ x])
r (reclist [if (f:car _) _] xs)
cr len.r)
(self r cr (cons (firstn (- cxs cr) xs) ys)))))
(sort > seq) len.seq nil))))
arc> (time10 (do (sort-by-commonest5 data) nil))
time: 1073 msec.
For sortby-commonest0, freshly loaded, here are the first 10 runs:
(3081 2102 1803 1585 1564 1529 1596 1813 1571 1580)
As you can see with the JVM it warms up and running the same code gets faster...
For example the second run (and my 340 must have been down the road?):
(581 553 539 540 543 550 541 542 544 536)
Then I tried a comparable succinct approach in Clojure
(no counts, so I used frequencies):
(defn sort-by-commonest01 [xs]
(let [h (frequencies xs)]
(sort (comparitor > #(h %)) xs)))
first 10: (4568 2750 2495 2125 2399 2131 2123 2133 2056 1087)
second 10: (1064 1074 1070 1069 1070 1101 1058 1062 1090 1066)
It's really hard to make decent comparisons (pun intended)... i.e. I mean should I only compare the first attempts, freshly loaded to make valid comparisons?
Not sure what I gain from this... I think it's fair to say the succinct version is slower than the longer version. Also I haven't even considered what the idiomatic approach would be in Clojure and I'm guessing Clojure has a built in function that's written by some code wizard :)
c) Not that it'll really save anything when it comes to the overall computational complexity, but this version of list length comparison will only traverse the lists up to the end of the shorter one:
(def longer-list (a b)
(if a
(if b
(longer-list cdr.a cdr.b)
t)
nil))
d) Where you say "(if empty.xs ys <else>)", you can probably squeeze out more performance with "(if xs <else> ys)". You can actually use "(iflet (x) xs <else> ys)" here too, but I don't know what that'll do to the performance.
e) There's no need to call 'testify.
After incorporating a bunch of this feedback--I didn't pit '< against '> --the code might look like this:
I don't follow which of those results is for clojure vs arc. sort-by-commonest0 ranges from 3081 to 536ms, while the clojure sort-by-commonest01 ranges from 4568 to 1066ms? I'm surprised that clojure is so consistently slower.
Can you show the first twenty runs for the arc sort-by-commonest2 vs the clojure sort-by-commonest01?
1. I made a big mistake on my time calculator for Clojure... it was using nanoTime with rounding errors. I had to re-write it. This is now correct with no rounding:
Yes, .... But, Clojure being faster was not really the part I cared about. What I feel this shows is that the straight forward idiomatic approach is, while concise, not always the optimal solution. The investigation was an attempt to consider other approaches, gain a better understanding of the languages and find some optimal paths. Comparing Arc to Clojure shows both languages have a similar ratio in performance metrics for the two cases, which allows me to normalize my approaches and not assume that an approach for one is equally applicable to the other.
I'm not surprised that the readable way is less performant; I'm reassured that the price for readability is quite low. I was totally unsurprised that both require similar optimizations given that both are fairly mature toolchains. Compilers by and large get their performance the same way.
The one thing that can cause the optimal optimizations to shift is different machines with different capacities and hierarchies of caches and RAM. But this is a compute-bound program so even that's not a concern.
Update: I should add that I enjoyed this summary of yours. The previous comment just gave data and left your conclusions unspoken.
I agree, and my apologies, I did give only data - at the time of the writing I was trying not to draw conclusions and leave the door open for other options to come forward, but I should have followed up.
I'm not so sure I can agree with the price for readability being quite low. One could call it premature optimization, but a ~50% gain is pretty significant in my mind. Had it been 20% or less I would probably go forward and spend less of my time attempting alternate approaches, but at ~50% I think playing around and learning the boundaries and their benefits can yield positive results for me.
It's 50% if you do just that. In a larger app it's a difference of 1ms.
Now you could argue that everything in an arc program will be slower so I should consider it to be 50%. Lately I've been mulling the tradeoff of creating optimized versions vs just switching languages. When I built keyword extraction for readwarp.com I was shocked how easy it was to add a C library using the FFI. Why bother with a less readable, hard-to-maintain arc version for 2x or 5x if you can get 50-100x using an idiomatic C version?
The whole point of a HLL is to trade off performance for readability; I'd rather back off that tradeoff in a key function or two.
---
Shameless plug section
Wart currently has an utterly simple FFI: create a new C file, write functions that take in Cells and return Cells, run:
$ wart
and they get automatically included and made available.
But I want to do more than just an FFI. I have this hazy idea of being able to specify optimization hints for my code without actually needing to change it or affect its correctness. They might go into a separate file like the tests currently do.
I dream of being able to go from simple interpreter to optimized JIT using just HLL lisp code (and a minimum of LLVM bindings). So far the prospect of attempting this has been so terrifying that I've done nothing for weeks..
I think that's generalizing too much. Using that one function with our nominal data set may only cost some ms, but what if you're dealing with hundreds of millions records? It may then, even though only representing .001% of your code base, account for 90% of your operating time - which is when someone normally kicks in with the "premature optimization" argument, which I can't really argue against, other than to say optimizing code is a skill that is generally done well by those who take it into account to begin with.
"Now you could argue that everything in an arc program will be slower so I should consider it to be 50%"
I wouldn't think this to be the case. I'm sure there's a tonne of juice squeezing one can do, but as a general statement, having played around with a lot of arc's code, I would guess most optimizing would yield less than 10%, but these 50+%, while they are few and far between are still worth the effort (to me).
The key, in all this, is to understand these languages well enough to make good judgment calls on where best to invest ones time.
I can't say much about wart and the rest (you're much deeper into language design than I am). :)
"what if you're dealing with hundreds of millions records?"
Then just write it in C :)
Let me try to rephrase my argument. Sometimes you care about every last cycle, most of the time you care about making it a little bit faster and then you're done. Sometimes your program needs large scale changes to global data structures that change the work done, sometimes it needs core inner loops to just go faster. Sometimes your program is still evolving, and sometimes you know what you want and don't expect changes.
just a little faster + work smarter => rearchitect within arc.
just a little faster + faster inner loops => rewrite them in C
every last cycle + rigid requirements => gradually rewrite the whole thing in C and then do micro-optimizations on the C sources. You could do arc optimizations here, but this is my bias.
every last cycle + still evolving => ask lkml :)
If you're doing optimizations for fun, more power to you. It's pretty clear I'm more enamored with the readability problem than the performance problem.
I don't want to sound more certain than I am about all this. Lately I spend all my time in the 'just a little faster' quadrant, and I like it that way.
"The key, in all this, is to understand these languages well enough to make good judgment calls on where best to invest ones time."
I don't disagree with your thought's, however I don't think they account for the TCQ aspect of everyone's situation.
Let's put put it another way. In my situation, if I have to look a using C then it's already game over[1]. However I can, with my limited amount of time (lunches, evenings and weekends) become proficient enough in Clojure (with a continual thanks to Arc).
What I am suggesting is that knowing the language well enough to easily identify these 50%+ hitters is a matter of finding low hanging fruit and at the same time becoming a better Arc/Clojure programmer. It does not mean I want to change my direction in the heavily-optimized/low-level to high-level/low-maintenance/readable code continuum.
[1] I'm not a programmer[2], I am a business analyst that works with software that simulates oil & gas production from reservoirs that sit several hundred kilometers underground. Simulation scenario's are run for 20 year production periods across hundreds of interdependent wells. The current software tools run the simulations across many cores, they take 8 days to run to completion and they cost about a half a million dollars per seat (+18% per year maintenance fees). This cost can be attributed to the R&D that occurred 8-10 years ago (i.e they required a team(s?) of P.Engs writing software in C to maximize the performance). Eight years ago you couldn't do (pmap[3] (sortby-commonest or whatever.... )) so easily. Nowadays I have the opportunity to create a 70% solution all by my lonesome, costing only a portion of my time. Hence why understanding the language well enough to find the low hanging fruit and not having to use C, is probably bigger deal to me.
[2] Well, maybe I shouldn't say this... rather I should say it's not my day job. I have no formal education in the field. I'm pretty much self taught.
[3] pmap is the same as map only it distributes the load across multiple processors in parallel.
I suspect you're entering that JavaScript code at the Arc REPL. ^^; I just pasted that JavaScript here as an excerpt of the full(er) instructions on the REPL page:
"Actually, if you you paste from Java Rainbow's arc.arc source yourself, you'll probably find that Rainbow.js encounters a stack overflow error. At this point it's necessary to enter it in more bite-size pieces. Right now you can accomplish this by pasting all of arc.arc into the input box, then entering the following code into a JavaScript console (not the Arc console!) to have it entered one line at a time:
[the JS code I pasted above]"
In Chrome, the JavaScript console is available under "(wrench icon) > Tools > JavaScript console." On Firefox, I tested it using Firebug's console tab, but I think the Error Console might work too.
Also, pasting the JavaScript snippet all at once should be fine. The point is to enter the Arc code in bite-size pieces, and the JavaScript snippet just automates that process.
I'm considering making this workaround a feature of the REPL interface, as a checkbox or something, but what I'd rather do is fix the issue altogether.
Done! Now the reader only uses a number of JavaScript stack frames proportional to the amount of nesting of the program, rather than the number of characters. Feel free to just paste all of arc.arc into the REPL and hit enter. ^_^
I've also tested the REPL on more browsers. The only one that really had a problem was IE 8, and that was because of the code of the REPL interface itself rather than the Rainbow.js runtime. It should work in IE 8 now... except that the performance is terrible there. XD
Any ideas about where this project should go next?
Rainbow.js is compiling with the Closure Compiler's advanced mode now! The REPL page--which still doesn't load arc.arc--is now down to 146 KB (HTML and JavaScript), rather than over 500.
Furthermore, on most recent browsers, the loading speed seems competitive with, or even better than, the loading speed of Java Rainbow. It might not be a fair comparison, since the contents being loaded are coming through a different kind of input, but I'm still pretty excited. ^_^
Generally speaking, I avoid iterative updates to tables as I often find it ends up being less efficient than consing a collection together.
Also,... this is not a stable sort as yours is, but maybe I can get some points for creativity?
What does 'XD' mean? I've noticed you use it in many of your comments, but I have yet to figure it out... lol.
* Sorry if the question seems out of place.. I, originally, intended to reply to your comment here: http://arclanguage.org/item?id=14942, however I can no longer reply to the thread.
In practice, in both my Arc and JS code, I use eval only when I absolutely have to... which is almost never. It's usually when I'm doing something incredibly hacky, like trying to eval macros at runtime...
Anyways, the traditional wisdom in JS is to avoid eval as much as possible. 99.99% of the time, eval is bad. It slows down your code, makes things like static analysis harder (or impossible), and it's usually completely unnecessary.
So, I don't see any practical problems with removing eval, but I do see some philosophical problems. eval has been a part of Lisp for so long, that it almost feels like a Lisp without eval isn't really a Lisp...
In any case, from a practical perspective, a lack of eval isn't a big deal at all. But I can see why some people might want eval (for non-practical reasons).
I noticed within your feedback forum there were suggestions for REPL window size changes (for both bigger and smaller) and I got to thinking that when I wrote PetroEnergyNews (http://petroenergynews.ca/map ... which has a similar bounding box look to it) I did a bunch of experimenting with different systems, different monitors and different screen resolutions before coming up with the following optimal combinations:
where the defaults were also the best suited to the iPad.
If you would like you can create an account on PetroEnergyNews then go to the user preferences to select the various combinations to get a feel for these sizes.
The "Widest" by "Tallest" setting is perfect for my 27 inch iMac screen :)
Anyways hopefully this info is helpful/useful.
[edit:
1. oops, that's only the map inset window, so it doesn't include the top and bottom bars, but still they should be easy to guesstimate at about 24px each.
2. I guess the width really doesn't matter when you can just set to 100%, but it may help with the height settings?]