Arc Forumnew | comments | leaders | submit | binx's commentslogin
2 points by binx 6256 days ago | link | parent | on: arc-to-c : soon on the git

GC is not quite a problem, just use boehm's GC and replace all malloc's with GCmalloc's, and delete all free's then it just works.

The main problem is about arc's macros. They're quite different to cl's because they're first-class, and you have to invoke the compiler frequently when running the code. So the target C code should obtain the source information for macro expansion.

-----

4 points by sacado 6255 days ago | link

Hey, by the way, I have a working GC now. It's terribly slow, but it's my first one, so I like it anyway. Yes, I know I should have used Boehm (no time lost to implement a slow, buggy, informally-specified implementation of a GC) but, after all, I don't play implementing compilers every day, so... It will be changed when it will need to (when Google SoC will be over, we'll have a new GC !)

-----

1 point by almkglor 6254 days ago | link

LOL.

Code or it didn't happen ^^. Would like to see it for hacking through too.

-----

1 point by almkglor 6256 days ago | link

Arc's macros are not perfectly first-class: they expand only if they're defined in the global namespace at the time the code is compiled. Try this:

  (mac my-mac () t)
  (def foo ()
     (my-mac))
  (mac my-mac () nil)
  (foo)
or even this:

  (def foo ()
    (my-mac))
  (mac my-mac ()
       '(prn "foo!"))
  (foo) ;this last will also depend on whether you're on Anarki or ArcN
So basically, you need to determine if a global would be assigned with a macro somehow, save the macro-code, and at macro-expansion time detect the macros and execute the macro-code in, say, an Arc interpreter.

-----

1 point by binx 6256 days ago | link

In principle, whether a global variable will be assigned with a macro is impossible to detemine. Arc doesn't distinguish the run-time global environment and the expand-time global environment, so you still have to carry the source information to keep the semantics of Arc-n...

-----

3 points by almkglor 6256 days ago | link

Yes you can - remember that macros must be assigned to global variables before they are used.

So what we do is, we use our own 'eval (this should be easy, this is a lisp after all). However, this 'eval's 'set must determine if it's assigning to a global or a local (again, this should be easy, since it has to know what to write where). Then it checks if a global assign is that of a macro.

For each top-level form, we 'eval the form and check if any global assign is a macro assign. Now we know if a macro has been assigned to a global. Then we compile the form.

The alternative is simply to check if a form has 'mac anywhere in it. If it has (mac ...) or (annotate 'mac ...) anywhere, we run this in 'eval instead of compiling that form.

Really, though, we don't need to keep the semantics of Arc completely. We can add a few rules for macro writers: Always use 'mac. Don't use anything other than the built-in functions, or if you really need a non-builtin function for your macro, then put it in a 'let form together with your macro, and put two copies of the 'let form.

Alternatively, we could just make a REPL for Arc, in Arc, write it in a form suitable for compilation, and then write the compiler in a form compatible with the REPL and arcN.

      ___
     v   \
  REPL    compiler
     \___^
Besides: having some macros is better than having no macros.

-----

1 point by binx 6256 days ago | link

You can't eval each top-level form in compile-time. They may have side-effects, may expect user inputs, and may terminate in weeks(for example, a program doing ray-tracing or nuclear-explosion simulation).

Explicitely distinguish expand-time and run-time is necessary for real static compilation. If you don't want to distinguish them, you are inevitably forced to do compilation in run-time.

-----

1 point by almkglor 6256 days ago | link

> You can't eval each top-level form in compile-time. They may have side-effects, may expect user inputs, and may terminate in weeks(for example, a program doing ray-tracing or nuclear-explosion simulation).

And you're compiling (not executing) a top-level form that does those things?

All right then - make the limitation that macros must be defined in their own top-level forms, and if a top-level form ever contains 'mac in a function position, or (annotate 'mac ...), then it's executed and any macros it assigns to globals are treated as such: macros. If it terminates in weeks, too bad.

Alternatively just write an interpreted REPL and include the ability to compile code, but not include macros in compilation - macros must be loaded into the REPL, then the actual runnable code is compiled (and the compiled code's macros are ignored).

Anything just to get macros.

-----

1 point by sacado 6256 days ago | link

Actually, GC is more complicated than I thought : references to objects are not always pointers (i.e., addresses to something allocated on the heap). Fixnums, for example, but also t and nil, are a special kind of references (for fixnums, the last bit is 0, indicating "the actual value of the x reference is x >> 1". And these values have to be collected from time to time. The fib example runs out of memory quite fast, without a malloc since it's only using fixnums.

I started writing a naïve one yesterday (you know, the kind of things that were state of the art 50 years ago) and it's a little easier than I expected. At least, it doesn't core dump anymore. I'm in infinite loops now :)

-----

2 points by binx 6256 days ago | link

Boehm's GC is conservative, which means it treats all the values on stack as pointers. So there may be some space leak, but it's acceptable. At least many compilers-to-C use boehm's GC, and many of them are in practical use now.

-----

2 points by sacado 6256 days ago | link

Yes, but the problem is that even the references that are actual pointers do not really hold a pointer address, as this address should not end with a 0. You would do something like :

  ref = (malloc (sizeof (cons-cell)) << 1) + 1;
That way, ref knows it is pointing something (as ref & 1 == 1), it knows the address to this thing (it is ref >> 1), but this is totally invisible to Boehm's GC, I suppose. I guess it would think "hmm, this address 01001000 does not seem to be pointed by anyone. The only pointer I can see around is 10010001. Not the same thing obviously. Let's free 01001000 !"

But maybe I'm missing something, I never used Boehm except in toy apps.

-----

3 points by binx 6256 days ago | link

Oh, in order to use boehm's GC, you should store references as normal pointers. And you have to tag the lowest two bits of fixnums and immediate consts with 01/10/11.

Another solution is to write your own object memory, allocation functions and GC yourself.

-----

3 points by almkglor 6256 days ago | link

This is true; we'll have to change tags so that real objects are actual pointers. The alternative is to make everything a pointer, i.e. even "numbers" are really pointers to numbers.

I think it's easier to swap the meaning of numbers with objects rather than build our own GC.

As for infinite loops in GC - do you make sure not to scan a memory area you've already scanned?

-----

2 points by binx 6256 days ago | link

Yes, and if he wants to write a GC, I advise him to start with a semi-space copying GC, which is simpler and has no recursions. By this you don't have to preserve an extra bit for mark-and-sweep, and you need no GC stacks for deep-first traversing.

-----

1 point by sacado 6256 days ago | link

I don't really like making everything a pointer. I don't know any Lisp implementation working this way (even the slowest ones). It really implies a performance hit every time you use a small integer value (with -2G < small < 2G). Anyway, binx's idea seems to be good...

Well, about infinite loops, I don't know, I stopped yesterday when my head was starting to burn... And gcc always compaining about pointers being of the wrong type (the way it is implemented, with tagged references, I have to cheat a lot). Oh, come on, gcc, who do you think you are, an Ada compiler ?

-----

4 points by binx 6256 days ago | link

You don't need to make everything a pointer, but you can make everything a structure of 12 or 16 bytes. Lua takes this approach and it performs rather well.

Here's the Lua way:

struct obj{ int type; union data{ double num; void *ptr; }; }

By this means, you can achieve architecture independence.

-----

3 points by almkglor 6256 days ago | link

This approach is the best. In fact this has to be done, because not all computers have sizeof(int) == sizeof(void* ); certainly my AMD64 running on a 64-bit kernel doesn't.

Note that we can actually abstract away the representation of objects from the actual binary bits that represent it. For instance I would recommend that objects use the following representation (as noted by binx):

  typedef struct s_obj{
    enum e_type {
      other = 0,
      number = 1,
      symbol = 2,
      character = 3
    }
    type;
    union u_data{
      void* other;
      int number;
      void* symbol;
      Uchar character; //unicode character
    } data;
  } obj;
  #define OBJ_TYPE(o) ((o).type)
  #define OBJ2FIXNUM(o) ((o).data.number)
  #define FIXNUM_MIN INT_MIN //requires glibc
  #define FIXNUM_MAX INT_MAX
  //others defined similarly
However, for machines with the following characteristics:

1. 32-bit pointers and 32-bit int's

2. malloc() always returns an address at a 32-bit boundary (i.e. every four bytes, or with the lowest two bits zero)

We can then use:

  typedef int obj;
  #define OBJ_TYPE(o)  ((o) & 0x03)
  #define OBJ2FIXNUM(o)  ((o) >> 2)
  #define FIXNUM_MIN (-(1 << 30))
  #define FIXNUM_MAX ((1 << 30) - 1)
Thus we are able to have portability while still gaining optimizations for some architectures.

-----

1 point by sacado 6255 days ago | link

In fact, maybe it's not a problem even on machines where ints are not as big as pointers. You can treat each value as an int pointer, and, when you need a fixnum, do

  (int) my_int_ptr << 1
If ints are bigger than pointers, that's okay : you loose possible bits, but at least it's working (for example, if ints are on 6 bytes and addresses on 4 bytes, only the 4 least significant bytes are useful, the 2 other ones are lost). If pointers are bigger than ints, it's still working, but this time lost space is in their pointer representation.

The only thing is that fixnums don't have an architecture-independant size, but that's not a problem in practice anyway. The only constraint is that the biggest fixnum is

  2**(max(sizeof(int), sizeof(int*)) - 2)
The last-bit-is-1-for-fixnum trick works on every architecture except machines where addresses are on 8 bits. Well, maybe we can ignore these ?

-----

3 points by almkglor 6254 days ago | link

Actually it is, arc2c output crashes on my AMD64. I'm trying to hack out a replacement, but it's not working yet.

The problem is that apparently the bits of pointers that happen to be beyond the bits of ints are not zero, so assigning a pointer to an int chops off the significant bits.

-----

1 point by sacado 6254 days ago | link

Oh... Maybe a simple fix would be to change int to long. I know this is not standard behavior, but in practice I think long and pointers are always the same size. mzscheme obviously works that way : every object is a pointer that you can cast to a long if it is a fixnum.

-----

2 points by almkglor 6254 days ago | link

This seems to be correct on my AMD64:

  #include<stdio.h>
  
  int main(void){
        printf("sizeof(int) = %d\n", sizeof(int));
        printf("sizeof(long) = %d\n", sizeof(long));
        printf("sizeof(void*) = %d\n", sizeof(void*));
        return 0;
  }

  sizeof(int) = 4
  sizeof(long) = 8
  sizeof(void*) = 8
However I think there may be processors/archs where sizeof(long) != sizeof(void* ). My attempt at fixing was to use a union, but something's wrong with the way closures are handled in my fix attempt - closures don't seem to have a type associated with them, so I'm not exactly sure how they're supposed to be done.

Edit: I'll probably need to search through C language specs, though - I'm not sure, but it might be standardized that sizeof(long) >= sizeof(void* )

-----

2 points by absz 6254 days ago | link

What you want are the two typedefs intptr_t and uintptr_t . Each one is defined to be an integer big enough to hold a pointer (the former being signed, and the latter being unsigned); they aren't mandatory, though, so you might have

  #if defined(__COMPILER1__) || defined(__COMPILER2__)
    typedef intptr_t fixnum;
  #else
    typedef long fixnum;
  #endif
This would guarantee you correct behaviour if intptr_t were defined, and probably give you the correct behaviour even if it weren't.

-----

1 point by almkglor 6254 days ago | link

From this?

  #include<stdint.h>
Hmm. Interesting. How well supported is it?

-----

1 point by sacado 6254 days ago | link

Thanks for the information ! I'll try it.

-----

1 point by stefano 6256 days ago | link

Making a fixnum a pointer to a allocated memory containing the value is terribly slow.

The infinite loop could be caused by two objects that references each other, e.g. a references b and b references a, so the GC keeps going from a to b and from b to a. To solve the problem, if this is the problem, if an object has been already marked, don't follow it.

Note: I've not read the source yet.

-----

1 point by sacado 6256 days ago | link

Hmm... That would work because addresses are on 4 bytes, so they always end with 00, right ? That's a nice idea :) But it wouldn't work on an architecture with addresses on less than 4 bytes, no ? There are not many nowadays, I guess, but I suppose that should be added as an assertion is the code...

-----

0 points by stefano 6256 days ago | link

You have to be sure that every allocated object is at 8 byte boundaries for the pointer to end with 00.

-----

1 point by ambi 6246 days ago | link

Do you know ALL_INTERIOR_POINTERS? In gc/doc/README:

"Any objects not intended to be collected must be pointed to either from other such accessible objects, or from the registers, stack, data, or statically allocated bss segments. Pointers from the stack or registers may point to anywhere inside an object. The same is true for heap pointers if the collector is compiled with ALL_INTERIOR_POINTERS defined, or GC_all_interior_pointers is otherwise set, as is now the default."

-----

1 point by binx 6294 days ago | link | parent | on: premature optimizations in Arc

  (fn (bigtree)
    (let o-self self
    (let trav
         (fn (node)
             (o-self node!l)
             (do-something node!v)
             (o-self node!r)))
    (map trav bigtree)))
I admit that it's more verbose, but I insist that separating fn and afn is unnecessary.

-----

6 points by almkglor 6294 days ago | link

I insist that having every function implicitly have a self-reference is bad. Sometimes I want a function's sense of self as being some other object - say a table representing a set of functions. Sorry, but I want control over the names I'm using - and the separation of 'afn, 'fn, and 'rfn gets me that control.

The above just gets worse when I need several nested afn's each with a few dozen fn's that need to recurse on the parent. Being able to select or not select the implicit shadowing of 'self is important to me.

-----

2 points by almkglor 6294 days ago | link

"All processes are impermanent ... All processes are afflicted ... All phenomena are not ‘Self’; when this is seen with knowledge, one is freed from the illusion of affliction. This is the pathway to purity." ^^ ^^ >.<

-----

1 point by binx 6294 days ago | link

I finally agree. But maybe afn should be the default choice instead of fn.

'Self' in buddism is different to 'self' in common sense. Of course it's not the topic of this forum. :)

-----

5 points by pg 6294 days ago | link

In this example it's not the end of the world, but it would cause problems in code that's generated by other code, e.g. by macros.

I experimented for a while with having if just be aif, and it was not good. There are times you want to back off from maximally using anaphoric variants, just as there are times you don't want to use [] or ./! notation even though you could. E.g. when it's nested. And if you don't have the non-anaphoric variants, you never can.

-----

1 point by binx 6294 days ago | link | parent | on: premature optimizations in Arc

In OO languages, methods implicitly bind self and the object to which the message is sent. Should we modify them to let self be explicitly bound?

-----

5 points by raymyers 6294 days ago | link

Most OO languages don't just "implicitly bind" self; self (or this) is usually a reserved word. A conflict with another variable is made impossible because you can't say "int this = 4" in Java or C++. As pointed out, Python already makes the binding explicit.

Further, let us imagine for a moment that lambda closures and class instances are ... different things.

-----

4 points by cooldude127 6294 days ago | link

not all of them. python, for example, does it explicitly. self is a convention, but you can call it anything you want.

and arc shouldn't do something just because other languages do it. it should do things based on whether they are the best way to do it. and i personally think it's a bad idea to introduce those kind of automatic symbol bindings into constructs which are so fundamental to the language, like fn and if.

-----

2 points by vrk 6294 days ago | link

Pardon the tangent, but in Perl 5 what you call the instance of the class is up to you. Conventionally it's $self, and the constructor is often new(), but these can be whatever you want. Unless you use a special CPAN module that does the work for you, the common idiom in declaring methods is

  sub method {
    my $self = shift;
    ...
  }
You could name it $myself, $this, or even $the_object if you wanted.

-----

1 point by binx 6299 days ago | link | parent | on: A proposal for a module system...

What if a "def" form is the result of macro expansion?

-----

1 point by almkglor 6299 days ago | link

What about it? Redefine "def", right? The def that gets referenced is still the same "def" you specially defined.

-----

1 point by binx 6303 days ago | link | parent | on: Parser combinators (first try)

Does it optimize LL(1) parsing as parsec do? You might need some techniques of laziness to achieve this. Without this optimization, there would be space-leak and time-wasting.

-----

1 point by binx 6303 days ago | link | parent | on: First-class lexical&global environments?

A proper module system which works well with non-hygienic macros seems to be difficult to make. I don't know whether it would be better if macros are made first-class.

-----

4 points by almkglor 6303 days ago | link

I disagree. Recall that macros in Arc have precedence in expressions over non-macros; that is, if obj in (obj x) is globally a macro, even if it is used as a variable name, all (obj x) will be replaced by the macroexpansion. Note that this applies for all x.

Suppose we have nested macro expressions. The inner macro will not see the expression until the outer macro processes that value; the outer macro might even decide to completely bollix the inner macroexpression.

   (macro ....
     (sub-macro ...))
Now suppose we declare a "using" macro, which replaces all symbols inside its expressions for other symbols, i.e.

  arc> (using
      (
         foo   bar
         nitz  koo)
      '(foo bar nitz koo))
  (bar bar koo koo)
Obviously, if we postulate another macro 'foo, the replacement done by using will prevent the macro 'foo from transforming any part of the (using ...) form.

The benefit of this using macro is as a basis for module systems. Suppose we have a module system such that:

  arc> (macex
    '(module foo
        (def foo () 0)
        (mac bar () nil)))
  (do
     (= (*members* 'foo) (list 'foo 'bar))
     (def foo--foo () 0)
     (mac bar--foo () nil))
  arc> (module foo
        (def foo () 0)
        (mac bar () nil)))
  <#procedure:foo--bar 3 mac>
  arc> (macex '(w/module foo (bar)))
  (do (foo--bar))
(Damn. I've been reading too many EWD's, my language is starting to sound academic.)

-----

1 point by binx 6303 days ago | link

Rewriting all symbols is what the CL package system does. Of course, the CL package system does not really rewrite symbols after parsing, it chooses different symbols for the same string in different packages when parsing.

A true module system just rewrites all identifiers that are globally referenced, without touching local identifiers and quoted symbols.

-----

1 point by almkglor 6303 days ago | link

Why not local identifiers? It won't hurt anyway, since:

  (fn (x) x) == (fn (y) y)
Quoted symbols are more problematic. The problem is that you can't determine how macros will bash quoted symbols. For example, (tag (x) ...) implicitly creates 'x. And prior to replacement, you can't be sure if the 'tag here is pg:arc1's tag or some other module's 'tag macro.

An alternative is to make the (pr ...) versions of similarly-named symbols in different modules the same; this probably involves hacking (coerce ... 'string) to cut out the module name for symbols.

-----

1 point by binx 6303 days ago | link

If a module doesn't export any macros, the problem would go away. Just macroexpand all function definitions, then all quoted symbols can be detected.

Hygienic macros can also be dealt properly, mzscheme is an example. But I have no idea how Arc macros can work well with modules.

Maybe the CL package system is the only way.

-----

1 point by almkglor 6302 days ago | link

Hmm, probably means we need a separate macro namespace. Heck, stuff like (let obj (table) (obj 1)) doesn't do as you expect, because of the mixing of macros in the same namespace as functions and values.

-----

3 points by bramsundar 6303 days ago | link

Aren't macros already first-class?

-----

1 point by binx 6303 days ago | link | parent | on: Two part noob question

(mac check args `(do ,@(each arg args `(report-result ,arg arg))))

might work, I didn't test.

-----

1 point by absz 6303 days ago | link

That's not going to work, as I observed above, because (each ...) only runs for the side-effects, and will always return nil.

-----

1 point by kennytilton 6303 days ago | link

Right, but I think the different approach to the splicing was sound and simpler for noobs:

  (def report-result (result form)
    (prn)
    (prn form)
    (prs '-> result)(prn)
    (prn))

  (mac check args
    `(do ,@(do (map (fn (test)
                    `(report-result ,test ',test)) args))))
I think I transliterated to the _ form accurately, but this commented out bit gives me an error from CAR.

;;; so why doesn't this work? ;;;(mac check args ;;; `(do ,@(do (map [`(report-result ,_ ',_)] args))))

Anyway, this then produces the expected results:

  (check
   (is (* 6 9) 42)
   (is (* 6 9) (coerce "42" 'int 13)))
The above should fail the first, pass the second, tho Adams denied knowing that was what he was going on about.

-----

3 points by almkglor 6303 days ago | link

  [`(report-result ,_) ',_]
becomes:

  (fn (_) (`(report-result ,_) ',_))
If you notice, whenever a macro in arc.arc wants to use `(), it avoids using the [... _ ...] syntax for it.

-----

2 points by kennytilton 6303 days ago | link

Oops, that extra "do" wrapping just the map was left over from debugging the macro and can be chopped. Note to noobs: this is also a macro-writing tip to remember, one can put debugging print statements that run as a macro gets expanded to help debug and even learn how to write macros.

-----

3 points by binx 6304 days ago | link | parent | on: x.y vs y.x

Then what about "$"? Haskell uses "$" for infix function application operator. Composing objects and methods with "." or "->" is almost a de-facto standard that too many people have got used to...

-----

3 points by vrk 6304 days ago | link

The dot notation is used for at least two different purposes: accessing field values (C structures, for example) and calling methods on/sending messages to an object (Java, for example). Similarly the arrow notation is used for at least two different purposes: accessing field values through a pointer (pointers to C structures, for example) and calling methods on/sending messages to an object (Perl, for example).

Standard? I don't think so. They are used for very different purposes, and just because the convention of separating a record or an object and its attribute with a dot or an arrow is common is not reason enough to blindly adopt the same convention in a new programming language.

Does the dot notation as a record and field separator make sense? Maybe, if that's all you can do in your programming language. Take a look at Common Lisp defstruct for an example of another way to define structures and access fields.

-----

1 point by binx 6304 days ago | link | parent | on: "Axioms" that might need to be added

The same statement can be used for multiple inheritance. Some people may call it a trouble maker, while some people call it a feature...

-----

3 points by shiro 6304 days ago | link

I don't quite get the parallel. I'm talking about introduction of a new feature X breaking a model implied by an existing feature Y, even if there's nothing wrong with either X or Y individually. Can you elaborate your remark?

-----

1 point by binx 6305 days ago | link | parent | on: Shorter programs, fewer axioms

1. Should (apply f '(1 2)) be written as (f . '(1 2)), that is, (f quote 1 2)?

2. It's like a kind of ad-hoc non-complete pattern matching. It would be better if we implement a complete pattern matching operator as a macro, in my opinion.

3. I have no idea whether it's right, but I would be glad if most special characters are left for the user who wants to write reader macros.

4. I have no idea.

-----

More