Kho tháng 7/2019
Thứ tư, 31 Tháng 7 năm 2019 18:38:04 ICT
Writer monad in Scheme
Just playing. The Writer monad could be written probably like this:
(import (scheme base)
(scheme write)
(srfi 8)
(srfi 99)
(only (util list)
intersperse))
(define-record-type <writer> make-writer #t
(value writer-value) (written writer-written))
(define (return x)
(make-writer x (string-append "returning " (number->string x))))
(define (>>= a f)
(define (str->list x)
(if (string? x) (list x) x))
(let ([b (f (writer-value a))])
(make-writer (writer-value b)
(append (str->list (writer-written a))
(str->list (writer-written b))))))
(define (run-writer x)
(values (writer-value x)
(apply string-append
(intersperse "\n" (writer-written x)))))
(define (add x)
(make-writer (+ x 1) "adding one"))
(define (mul x)
(make-writer (* x 2) "multiply by two"))
(receive (value log)
(run-writer (>>= (>>= (return 2) add) mul))
(display value)
(newline)
(display log)
(newline))
We add 1 to 2, then multiply by 2. The result is
$ gosh ~/t/writer.scm
6
returning 2
adding one
multiply by two
The monad effect is collecting the strings "adding one" and "multiply by two".
The >>=
part is far from easy to read. So with some magic to
recreate "do" syntax (stolen from
here)
(define-syntax monad-do
(syntax-rules (let1)
[(_ e) e]
[(_ (let1 var e1) e2 ...)
(>>= e1 (lambda (var) (monad-do e2 ...)))]
[(_ e1 e2 ...)
(>>= e1 (lambda (_) (monad-do e2 ...)))]))
we can rewrite the main code to
(run-writer (monad-do (let1 foo (return 2))
(let1 bar (add foo))
(mul bar)))
which is much easier to read (and very imperative looking). If we
reverse the arguments of >>=
though, to
(define (>>= f a)
the main code does not look that bad to require "do"
(run-writer (>>= mul (>>= add (return 2))))
But yeah monad is basically it. It takes a monad value and a function. It calls the function with the real value inside the monadic value, then it combines the context from the first monad value and the one produced by the function.
Thứ hai, 29 Tháng 7 năm 2019 19:44:08 ICT
I know monad!
Or I don't. Actually I'm not sure I do.
The blame goes to SRFI-159, SRFI-165 and SRFI-166. But monad seems interesting. Basically it's a way to manage some "side context" while applying a sequence of functions in a neat way.
One example, in a pure functional program, how do you manage state? The program state must change, but the in functional programming state is immutable.
The easiest way is passing a state variable in all functions, and return a pair of whatever-value-that-function-produces and the new state. It works, but it's a but cumbersome.
With monad and the bind operator >>=
(at least in haskell), the
state management part is taken out and the function can just focus on
the true logic, not worrying too much about states.
Of course the function still needs access to the state, or to produce
a new state. That's when the ask
and local
function from
Reader monad
comes in. This reader monad is represented as "environmental monad" in
SRFI-165. Different name, same beast (though Scheme code is still
easier to understand even if not as concise)
There's also Writer monad to "keep a log" on the side. The real State monad looks a bit more complicated. But well that's something for another monday.
Thứ ba, 16 Tháng 7 năm 2019 19:43:57 ICT
Think recursive and you understand...
... too bad thinking recursively is not that easy
(define-syntax let-xcb
(syntax-rules (font gc pixmap)
[(_ ((name (font args ...))) body ...)
(let ([name (make-xid)])
(dynamic-wind
(lambda () (xcb-open-font :fid name args ...))
(lambda () body ...)
(lambda () (xcb-close-font :font name))))]
[(_ ((name (gc args ...))) body ...)
(let ([name (make-xid)])
(dynamic-wind
(lambda () (xcb-create-gc :cid name args ...))
(lambda () body ...)
(lambda () (xcb-free-gc :gc name))))]
[(_ ((name (pixmap args ...))) body ...)
(let ([name (make-xid)])
(dynamic-wind
(lambda () (xcb-create-pixmap :pid name args ...))
(lambda () body ...)
(lambda () (xcb-free-pixmap :pixmap name))))]
[(_ ((name (key args ...))) body ...)
(syntax-error "invalid keyword" key)]
[(_ ((name value) rest ...) body ...)
(let-xcb ((name value))
(let-xcb (rest ...)
body ...))]
[(_ () body ...)
(begin body ...)]))
Pretty nice "RAII" in Scheme actually. Though of course if xcb-open-font could associate itself with the corresponding "destructor" procedure, it would be even nicer. But that feels a bit heavy.
Of course it's also nicer to reduce duplication in this macro. It's probably possible, but this is easier to read.
Thứ sáu, 05 Tháng 7 năm 2019 19:15:15 ICT
Examining std::function<>
C is simple. C++ is complicated. If you have a function pointer
variable "fn" in C, gdb's "print" would tell you exactly what it
points to. And list *fn
would even show you the source code.
Try that on std::function<> which is the C++ way of carrying function pointer around. The printout is confusing at best.
At least if std::function<> contains a simple C function (or static class function), things aren't quite as bad.
(gdb) p f
$1 = {
<std::_Maybe_unary_or_binary_function<int, int, int>> = {
<std::binary_function<int, int, int>> = {<No data fields>}, <No data fields>},
<std::_Function_base> = {
static _M_max_size = 16,
static _M_max_align = 8,
_M_functor = {
_M_unused = {
_M_object = 0x400947 <real(int, int)>,
_M_const_object = 0x400947 <real(int, int)>,
_M_function_pointer = 0x400947 <real(int, int)>,
_M_member_pointer = &virtual table offset 4196678, this adjustment 72704
},
_M_pod_data = "G\t@\000\000\000\000\000\000\034\001\000\000\000\000"
},
_M_manager = 0x400fd4 <std::_Function_base::_Base_manager<int (*)(int, int)>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)>
},
members of std::function<int(int, int)>:
_M_invoker = 0x400f81 <std::_Function_handler<int (int, int), int (*)(int, int)>::_M_invoke(std::_Any_data const&, int&&, int&&)>
}
It's still quite confusing, but _M_function_pointer
kinda hints what
it is already. Which is correct in this case. But let's talk a bit
about some other fields here before moving on to more complicated
functions.
std::function
is a quite simple class actually. It has 32-bytes,
which is 16 bytes of opaque data and two function pointers:
_M_manager
and _M_invoker
.
I'm not sure about the manager thing (probably for opaque data management), but the invoker is the equivalent of virtual 'operator()'. Except in this case, std::function avoids virtual methods and everything, probably to stay small and cheap.
When you invoke an std::function, its invoker is called with the
16-byte opaque data above. When a regular function is stored in a
std::function, the invoker version used will take consider the first 8
bytes on the opaque data as the function address, which is what you
see above (_M_unused
is a union by the way).
For the record, the function is this
(gdb) l *f._M_functor._M_unused._M_function_pointer
0x400947 is in real(int, int) (func1.cpp:4).
1 #include <functional>
2
3 int real(int v1, int v2)
4 {
5 return v1 + v2;
6 }
7
8 int main(int ac, const char **av)
9 {
10 std::function<int(int, int)> f = real;
OK what if you store something more sophisticated, like a std::bind result? You may see something like this
(gdb) p f2
$1 = {
<std::_Maybe_unary_or_binary_function<int, int>> = {
<std::unary_function<int, int>> = {<No data fields>}, <No data fields>},
<std::_Function_base> = {
static _M_max_size = 16,
static _M_max_align = 8,
_M_functor = {
_M_unused = {
_M_object = 0x615e70,
_M_const_object = 0x615e70,
_M_function_pointer = 0x615e70,
_M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x615e70, this adjustment 4199941
},
_M_pod_data = "p^a\000\000\000\000\000\005\026@\000\000\000\000"
},
_M_manager = 0x400e7d <std::_Function_base::_Base_manager<std::_Bind<int (*(std::_Placeholder<1>, int))(int, int)> >::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)>
},
members of std::function<int(int)>:
_M_invoker = 0x400e3f <std::_Function_handler<int (int), std::_Bind<int (*(std::_Placeholder<1>, int))(int, int)> >::_M_invoke(std::_Any_data const&, int&&)>
}
The _M_unused
part does not make sense anymore. But the trick is
this, the entire std::bind result (which is called a "functor") is
actually allocated dynamically and its pointer is _M_object
(or any
other fields in that union, of course). If you know its type (remember
std::bind is a template) then you can cast it back and maybe see
something.
And the type is actually here, in the _M_invoker
line, second
template argument. So what if we cast it back and print it out?
(gdb) p *(std::_Bind<int (*(std::_Placeholder<1>, int))(int, int)>*)f2._M_functor._M_unused._M_object
$2 = {
<std::_Weak_result_type<int (*)(int, int)>> = {
<std::_Weak_result_type_impl<int (*)(int, int)>> = {<No data fields>}, <No data fields>},
members of std::_Bind<int (*(std::_Placeholder<1>, int))(int, int)>:
_M_f = 0x400887 <real(int, int)>,
_M_bound_args = std::tuple containing = {
[1] = {
<std::_Placeholder<1>> = {<No data fields>}, <No data fields>},
[2] = 2
}
}
This looks a lot better. The _M_f
line actually tells us about the
function (and we can of course locate it with l *$2._M_f
. But the
more interesting thing here is the bound arguments. The original
std::bind is this:
std::bind(real, std::placeholders::_1, 2);
And we can see from _M_bound_args
that the first argument is
_Placeholder<1>
, and the second one the value 2. The function
prototype at _M_f
line already hints about the placeholder
positions.
(Note, if you really examine raw data, std::tuple<> actually stores data in backward order)
This should work for class member functions as well, at least non-virtual ones. Doing the same trick we see
(gdb) p *(std::_Bind<int (Foo::*(Foo*, std::_Placeholder<1>, int))(int, int)>*)0x615e70
$2 = {
<std::_Weak_result_type<int (Base::*)(int, int)>> = {
<std::_Weak_result_type_impl<int (Base::*)(int, int)>> = {<No data fields>}, <No data fields>},
members of std::_Bind<int (Base::*(Base*, std::_Placeholder<1>, int))(int, int)>:
_M_f = &virtual Base::real(int, int),
_M_bound_args = std::tuple containing = {
[1] = 0x7fffffffdb30,
[2] = {
<std::_Placeholder<1>> = {<No data fields>}, <No data fields>},
[3] = 2
}
}
Pretty much the same. The first argument is "this" pointer of
course. I deliberately bound to the base virtual function to hide the
function signature, so you don't really know what exact derived class
it is. But normally you should have the right class name at line
_M_f
.
Anyway to find out what exact function it is then? There is (and it's how correct virtual function is executed). You need to look at the virtual function table.
Actually you won't have to look very far into the virtual table. You already know the first argument is "this" pointer. This is something new: the first pointer in this object should always be the vtable (unless there are no virtual functions of course, which is not the case here), so:
(gdb) set print asm-demangle on
(gdb) x/a 0x7fffffffdb30
0x7fffffffdb30: 0x401db8 <vtable for Foo1+16>
There it is, it's "Foo1::real()". Knowing the class name, you use gdb to decode the vtable:
(gdb) info vtbl (Foo1*)0x7fffffffdb30
vtable for 'Foo1' @ 0x401db8 (subobject @ 0x7fffffffdb30):
[0]: 0x400e18 <Foo1::real(int, int)>
and locate the function using that function address:
(gdb) l *0x400e18
0x400e18 is in Foo1::real(int, int) (func4.cpp:13).
8 };
9
10 class Foo1 : public Base
11 {
12 public:
13 int real(int v1, int v2) override
14 {
15 return v1 * v2;
16 }
17 };
If you want to decode the vtable yourself, it's not that hard. "x/a" should give you the function addresses when you give it the vtable address 0x401db8.
That address is actually in the middle of the virtual table. There are two words before it to form class hierarchy and give type info. But that address is the start of the first virtual function (or more if you have more than one).
That's it. I still don't like fat pointer but what can you do about it.
What about lambda? It's not that bad from a quick glance
(gdb) p f6
$1 = {
<std::_Maybe_unary_or_binary_function<int, int>> = {
<std::unary_function<int, int>> = {<No data fields>}, <No data fields>},
<std::_Function_base> = {
static _M_max_size = 16,
static _M_max_align = 8,
_M_functor = {
_M_unused = {
_M_object = 0x1,
_M_const_object = 0x1,
_M_function_pointer = 0x1,
_M_member_pointer = &virtual table offset 0, this adjustment 4197477
},
_M_pod_data = "\001\000\000\000\000\000\000\000e\f@\000\000\000\000"
},
_M_manager = 0x4008e8 <std::_Function_base::_Base_manager<main(int, char const**)::<lambda(int)> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation)>
},
members of std::function<int(int)>:
_M_invoker = 0x4008a9 <std::_Function_handler<int(int), main(int, char const**)::<lambda(int)> >::_M_invoke(const std::_Any_data &, int &&)>
}
since you can figure out roughly where the lambda is from based on the
"namespace" part in _M_invoker
, which is main(int, char const **)
.
But if you want even more accuracy, it gets a bit ugly.
Not sure if it's always is the case, but in this case, lambda code is embedded in the invoker. The first step, even with "/m" to show source code, does not help much:
(gdb) disassemble f6._M_invoker
Dump of assembler code for function std::_Function_handler<int(int), main(int, char const**)::<lambda(int)> >::_M_invoke(const std::_Any_data &, int &&):
0x00000000004008a9 <+0>: push %rbp
0x00000000004008aa <+1>: mov %rsp,%rbp
0x00000000004008ad <+4>: push %rbx
0x00000000004008ae <+5>: sub $0x18,%rsp
0x00000000004008b2 <+9>: mov %rdi,-0x18(%rbp)
0x00000000004008b6 <+13>: mov %rsi,-0x20(%rbp)
0x00000000004008ba <+17>: mov -0x18(%rbp),%rax
0x00000000004008be <+21>: mov %rax,%rdi
0x00000000004008c1 <+24>: callq 0x4009fd <std::_Function_base::_Base_manager<main(int, char const**)::<lambda(int)> >::_M_get_pointer(const std::_Any_data &)>
0x00000000004008c6 <+29>: mov %rax,%rbx
0x00000000004008c9 <+32>: mov -0x20(%rbp),%rax
0x00000000004008cd <+36>: mov %rax,%rdi
0x00000000004008d0 <+39>: callq 0x400be9 <std::forward<int>(std::remove_reference<int>::type&)>
0x00000000004008d5 <+44>: mov (%rax),%eax
0x00000000004008d7 <+46>: mov %eax,%esi
0x00000000004008d9 <+48>: mov %rbx,%rdi
0x00000000004008dc <+51>: callq 0x4006fc <<lambda(int)>::operator()(int) const>
0x00000000004008e1 <+56>: add $0x18,%rsp
0x00000000004008e5 <+60>: pop %rbx
0x00000000004008e6 <+61>: pop %rbp
0x00000000004008e7 <+62>: retq
End of assembler dump.
The second step however does show something. Disassembling the last callq:
(gdb) disassemble/m 0x4006fc
Dump of assembler code for function <lambda(int)>::operator()(int) const:
11 std::function<int(int)> f6 = [v1](int v2) ->int { return v1 + v2; };
0x00000000004006fc <+0>: push %rbp
0x00000000004006fd <+1>: mov %rsp,%rbp
0x0000000000400700 <+4>: mov %rdi,-0x8(%rbp)
0x0000000000400704 <+8>: mov %esi,-0xc(%rbp)
0x0000000000400707 <+11>: mov -0x8(%rbp),%rax
0x000000000040070b <+15>: mov (%rax),%edx
0x000000000040070d <+17>: mov -0xc(%rbp),%eax
0x0000000000400710 <+20>: add %edx,%eax
0x0000000000400712 <+22>: pop %rbp
0x0000000000400713 <+23>: retq
End of assembler dump.
Line 11 is where the lambda is from:
(gdb) l 11
6 }
7
8 int main(int ac, const char **av)
9 {
10 int v1 = 1;
11 std::function<int(int)> f6 = [v1](int v2) ->int { return v1 + v2; };
12 return f6(3);
13 }
Not exactly sure how "v1" is captured. Well... not in the mood to
stare at these disassembly listings for long. At higher optimization
level, that secondary call may be cut out and _M_invoker
contains
the lambda body too:
(gdb) disassemble/m f7._M_invoker
Dump of assembler code for function std::_Function_handler<int(int, int), rrr()::<lambda(int, int)> >::_M_invoke(const std::_Any_data &, int &&, int &&):
4 return [](int v1, int v2) ->int { return v1 + v2; };
0x00000000004007b0 <+0>: mov (%rsi),%eax
0x00000000004007b2 <+2>: add (%rdx),%eax
End of assembler dump.
(gdb) l *0x00000000004007b0
0x4007b0 is in std::_Function_handler<int(int, int), rrr()::<lambda(int, int)> >::_M_invoke(const std::_Any_data &, int &&, int &&) (zz.cpp:4).
1 #include <functional>
2 std::function<int(int, int)> rrr()
3 {
4 return [](int v1, int v2) ->int { return v1 + v2; };
5 }
6