2017-05-19

WTFP #2: OCaml ThunkLists

After roughly one and a smidge years of hiatus, I am back with the promised WTFP #2, OCaml infinite lists.

They aren't quite the same thing as the infinite lists in Haskell, due to the fact that OCaml isn't as lazy as Haskell is, but they follow a somewhat similar principle to the FLists I mentioned back in the previous WTFP. This is the definition of a Thunklist, or thunked:

type 'a thunked = Thunk of 'a * (unit -> 'a thunked)

What is this saying? Well, since it has been more than a year since I touched OCaml, I will do my best to explain.

Translating into haskell:

data ThunkList a = Thunk a (() -> ThunkList a)


You see, since the only thing that OCaml is really lazy about are functions, we define a list consisting of a head element, and a tail element that can't be bothered to get off its ass and show up right away. If we pay it a dollar (in this case, a unit or ()), it will stick its head round the corner but keep its tail hidden. And so forth.
So, what this does is basically implement artificial laziness on the second component of the list, so we can handle infinity since we never have to see it. From this point, we can define the basic list functions head and tail:

(* first element of a thunked list *)
let head : 'a thunked -> 'a =
  fun z -> match z with | Thunk (x,_) -> x

(* remainder of a thunked list *)
let tail : 'a thunked -> 'a thunked =
  fun z -> match z with | Thunk (_,th) -> th ()


and even our old friend (!!):

(* index accessor of a thunked list *)
let rec get : int -> ('a thunked -> 'a) =
  fun i -> match i with
    | 0 -> head
    | y when y > 0 -> fun xs -> get (i-1) (tail xs)
    | _ -> failwith "cannot get negative index"


But where do we go from here? Well, exploiting the laziness mentioned earlier, we can even have bidirectionally infinite lossless lists, if we could be bothered.

(* dethunk: a doubly-lazy, doubly-infinite list *)
type 'a dethunk = (unit -> 'a thunked) * 'a * (unit -> 'a thunked)

Since I am too lazy, I will leave implementing the functions you would want out of a bidirectionally infinite lossless list. I have the implementation in front of me, but I am just too lazy.

2016-03-03

WTFP #1: Haskell FLists

If you are unfamiliar with how lists are defined/represented in Haskell, here is a quick breakdown. The List type has two data constructors:

List a = Nil | Cons a (List a)

which is simplfied with syntactic sugar to

[a] = [] | a : [a]

What does this mean for us? Well, since lists expose only their first element, retrieving the first element of a list is easier than retrieving the last, and adding the beginning of a list is easier than adding to the end. Aside from defining an anti-list, which exposes its last element (which would run into the same problem on the other end), there is no obvious solution to this in terms of creating a concrete type. Syntactic sugar is our friend, however. When Haskell (or, rather, your friendly neighborhood Haskell compiler) sees [1,2,3], this really means 1:(2:(3:[])) when you remove the artificial sweeteners. If we wanted to add 4 to the end of this list, we would have to unwrap it until we got to the [], and replace it with (4:[]). For this small case, it's not a huge deal, but for much larger lists, this can be time-consuming, especially when we want to add more than one element to the end, and not all at the same time. This is where FLists come in (the term is most likely not an actual one, but it may very well be one, I don't know). One key observation is that the [] part of each list is completely useless, aside from bookkeeping where the list ends and allowing said appending. The key idea behind FLists is that we represent a list of values by the function that prepends those values to a list.

newtype FList a = FList { f :: [a] -> [a] }

We are using a newtype instead of a type alias so that we can create instances of various typeclasses to make this more like a list, because functions are paragon (cannot be compared), illegible (cannot be read), and obscene (cannot be shown). With this representation of a list, we can easily translate between the two, and easily join two lists in either order with little issue. Concatenating two FLists is as easy as composing the functions they hold:

append :: FList a -> FList a -> FList a
append (FList f) (FList f') = FList $ f . f'

toList :: FList a -> [a]
toList = ($[]).f

fromList :: [a] -> FList a
fromList [] = FList id
fromList (x:xs) = Flist (x:) `append` fromList xs

instance Show a => Show (FList a) where
    show = show.toList

We can even implement an instance of Eq for FList:

instance (Eq a) => Eq (FList a) where
    (==) f g = (==) (f []) (g [])

The applications of this are fairly limited, but where it is applicable, it can be very useful. The next WTFP will explore the possibilities for infinite lists in OCaml.

2016-02-11

WTFP: Announcement

After having absolutely nothing to write about on this blog for 2 years (give or take), I am pleased to announce that I still have nothing to write about, but this time I will not let that stop me from posting.


I am starting a series of posts called "What the Func(tional programming)", or WTFP for short. These posts will consist of observations, insights, and delerious ramblings (mostly the last) I have regarding Functional Programming, in particular Haskell (which I am self-teaching), and OCaml (which I am taking a class in). I will try to alternate between these languages (and any other I decide to introduce) as much as possible, but if I run out of one, I might have serial posts on the other. The next two posts will be about non-standard list representations in Haskell/OCaml (respectively) that have unique features that distinguish them from the standard list representations. Stay tuned to this station. If you were able to touch that dial, I would advise against it, probably because it is unclear what that dial would do if turned.

2014-09-05

Utterly Pointless Update

A quick update on my previous post, Utterly Pointless. I showed the same problem to my brother, who at first concluded the same thing that I did, but later raised the question as to why the first and third arguments have to be of the same type. His argument was that the polymorphic nature of the type parameters meant that they could take on different type signatures in different contexts. If we deconstruct the definition of f and uncurry it, we arrive at the equality

f(x,y,z) = const(y,f(z))

where const is a function that returns its first argument and ignores its second argument. Since we can call f with a first argument of type elephant, or a first argument of type giraffe, why do x and z have to both be elephants or both be giraffes?

While I am in no way an expert or by any means well-acquainted with Haskell's type inference system, my guess is that when Haskell is trying to construct a type for f from scratch, it assumes that f must be of a fixed type, and lets the polymorphism enter after it decides what relationship should exist between the various types in f's type signature.

>After my father lectured me for a brief while about category theory, and I was still unable to find a reason why the first and third arguments should have to be the same type, I decided to explicitly write out my brother's more general type for f, namely

f :: a -> b -> (c -> b)

Surprisingly, Haskell was completely okay with this type signature for the same definition, and allowed me to pass first and third arguments of different types, whereas it complained when I tried to do so without the custom signature. My theory is this: Haskell has no way of knowing that it might assume the more general type, so it starts out the same way that I did in my previous post. This more general signature is perfectly valid, and can be verified to be valid, but it cannot be constructed from scratch. If you give a mathematician an attempted proof of Goldbach's conjecture, they will probably be able to tell you whether it is a valid and complete proof, but no mathematician thus far has been able to construct a proof. I know that this analogy is getting the Curry-Howard isomorphism completely backwards, but I hope it at least gets my point across. 

If anyone has insight into what is happening here inside the type inference system and the type checking system, or about any aspect of this seeming paradox, please leave a comment. As usual, the comments section is also open to comments/criticisms/heckling.

EOF

2014-09-04

Utterly Pointless

While reading a Haskell book that made mention of Haskell coding golf, I got a bit distracted and did a quick search for "haskell golf." Besides several results for the "Krueger-Haskell Golf Course" in Wisconsin, I found an exercise to write the shortest function of type a -> b -> (a -> b) . For those of you not familiar with curried functions (or the variety used in Haskell in particular), this means that if a function f has that type (written as f :: a -> b -> (a -> b)), then applying f to a value of type a produces another function, that takes a value of type b and produces a third function, which simply takes a value of type a and returns a value of type b. In other words, if we wanted type a to be a String and type b to be an Int, then we would have something like

f("Fish!") = g(x) 
g(2) = h(x) 
h("No fish...") = 42

if that doesn't make it even more confusing. Since Haskell is functional, lambdas are important, which are written as backslashes in Haskell because keyboards with lambdas are harder and harder to find as the Greek keyboard-fabrication industry hasn't been doing very well as of late. I will try to explain everything as I go along, but do your own research if you would like to understand anything I am saying before, while, or after I say it.

Of the entries posted thusfar, the shortest one was this one

f _ = (. f) . const

with the whitespace added in to make it more readable.

I had learned enough Haskell to understand every individual part of this (except for const, which wasn't hard to look up), and yet the whole was far from clear to me. (Feel free to treat this as an exercise if you wish, in which case you should stop reading until you work out why this function has the type specified. Otherwise, read on.)

We will start, naturally, at the end of the expression. const is a function that takes any two results, and always produces the first. In curried terms, it takes a value and produces a function that returns that value for any input it is given. In lambda expressions this would be expressed as

const = λ x → λ y → x

or, written in a way that is more consistent with Haskell,

const = \x -> _ -> x

That underscore is a catch-all that avoids naming a variable that will never actually be used, similar to a '.' in regular expressions. Speaking of which, '.' is Haskell's syntax for function composition, meaning that (f . g) x == f (g (x))

Now that we have understood that much, we can turn back to our Franken-function. By the definition of const, we know that it is of type a -> b -> a, since it has the same return value, and therefore type, of its first argument, and we don't actually care what its second argument's type is. Function composition (.) has an even more complicated type, namely (b -> c) -> (a -> b) -> (a -> c)   This is merely saying that it takes a function whose input type is the output type of its second function, and glues them together to produce a function that first applies one, and then applies the other on the result of the first. Due to the way it is defined, both in Haskell and in mathematics, function composition applies the innermost, or rightmost, function first, and the outermost, or leftmost, function last. In other words, whatever is to the right of a . is applied first, and whatever is on the immediate left is applied second.

Let's turn back to our problem. In order to make it possible to determine the type of our function f, we first need to name its type so we can perform some type inference. Since all functions ultimately take one value (which might be an integer, a string, a list, an elephant, or even another function) and produce another value (which might be an integer, a string, a list, a giraffe, or even another function), we can say that our function f takes a value of type x and returns a value of type y. We would write this as

f :: x -> y

Right now, we don't know or care what x or y actually represent, we are just using them in order to refer to the argument and return types of f. We could just as easily have called them a and b, or elephant and giraffe, but it is certainly more concise than referring to them as "the argument type of f" and "the return type of f". Because parentheses have the highest precedence, we will examine the term inside the parentheses first. (. f) might look like someone forgot a function to put to the left of the composition operator (let's just call it 'dot' for brevity), but the infix expression foo . bar is really just shorthand for (.) foo bar . Because Haskell tends to curry everything, the partially applied expression (. f) is really just another function, which is asking for a function to put on its left side so it can be a complete expression. In other words, back to our lambda expressions, (. f) == \g -> g . f . If we look at the type of this expression, we can see that it must be (. f) :: (y -> z) -> (x -> z) , since the composition is already half-done and f is of type x -> y

Now we can finally look outside the parentheses and see what is going on. If we refer to the expression (. f) as g, we can save a lot of trouble. Our RHS now looks like g . const . Remembering the type expression for const , we can work out what the expression as a whole must be. From the type of g, we can eventually see that, in this case, the exact type of the composition operator ('dot' sounds a bit silly) must be

. :: ((y -> z) -> (x -> z)) -> (a -> b -> a) -> ?

Hmm, we seem to have reached an obstacle. Nothing to worry about, though, as long as we can match up the type expression for (.) with the weird expression we have here.

. :: ((y -> z) -> (x -> z)) -> (a -> (y -> z)) -> (a -> (x -> z))

If we look at these two expressions, it isn't too hard to figure out that y == b and z == a. So our overall RHS has a type of a -> (x -> a) . Now, as to the input type of f, the underscore after f on the LHS tells us that we discard the first argument, so the first type is arbitrary. Recalling the definition of x, we now know that

f :: x -> a -> (x -> a)

but the type variables were never really special anyway, so we can just rename them as we please, so long as we are consistent about the renaming. If we replace every 'x' with an 'a', and every 'a' with a 'b' (in parallel, not in sequence, of course), we will get the definitive type of our Franken-function:

f :: a -> b -> (a -> b)

 which was what we wanted in the first place.

Thanks for letting me take you on a magical journey through the Krueger-Haskell Golf Course. Please return any borrowed clubs to the front desk.

(If you're wondering about the title, point-less or point-free is a style of writing Haskell that omits all named references to arguments, where such named references are known as 'points.')

EOF

Counting to 100

Today, while reading Chapter 6 of The Unix Programming Environment by Kernighan and Pike (Programming with Standard IO), I wanted to test a program that took a long-ish file as input. Naturally, I needed to have a long-ish file, and I don't generally keep them lying around, as I an near-obsessive about the organization of my files and folders and trash anything I won't need again. I opened up a new document in vim and decided that I wanted 100 lines of text to properly test the program. Okay then.

Like a true programmer, I wanted to make sure that no lines were being dropped, so I intended to number them. Also like a true programmer, I was too lazy to type out 100 line numbers by hand. While normally, when in such a scenario, I would try to learn [far more than] enough perl or awk or some other basic scripting language to perform this task, I had been reading about IO in Haskell only the day before.

Haskell, for those who are not familiar, is a functional programming language that can be described as "like lisp, but without the parentheses," or "like Haskell, but with a less recursive comparison." Being a functional programming language, Haskell is great for all sorts of things, with many mathematical applications. Granted, without optimization, Haskell can be much slower than other, less functional languages, but it can be useful sometimes.

As the task I faced involved counting to 100, the mathematical nature of Haskell lent itself to this problem. I wrote out the following line of code

main = sequence $ map print [1..100]

and entered

:.!runhaskell

into vim. This produced the numbers from 1 to 100, all on their own line, with some extra output at the end:

[(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),()]

but that was quickly dealt with. While I mostly view this application of programming knowledge as an achievement, I also feel somewhat of a false programmer, because my laziness actually expedited the result instead of impeding it by leading me on a wild goose-chase of discovery and mosquito-nuking.

This is my first post on my first real blog, so I welcome any comments/criticisms/heckling either related or wildly unrelated to this post or blog.

EOF