Surratt On Software

Saturday, October 19, 2019

Learning me a Haskell

I was doing more katas on Code Wars this afternoon and came across this one called Isograms. I quickly created a solution in Java which did the thing it needed to do.


    public static boolean isIsogram(final String str) {

        if (str == null) {
            throw new IllegalArgumentException(
               "str parameter may not be null."
            );
        }

        if ("".equals(str)) {
            return  true;
        }

        final String lowerCase = str.toLowerCase();
        final char[] chars = lowerCase.toCharArray();

        final List charactersFound = new ArrayList();

        for (int i = 0; i < chars.length; i++) {
            char c = chars[i];
            if (charactersFound.contains(c)) {
                return false;
            } else  {
                charactersFound.add(c);
            }
        }

        return true;
    }

Not pretty, not concise, but it gets the job done. I knew that it could be solved far more concisely using streams, so I switched to Kotlin and wrote this.


    fun isIsogram(str: String): Boolean {

        if ("" == str) {
            return true
        }

        val distinctCharacters = str.toLowerCase()
                  .asSequence().distinct().count()

        return str.length == distinctCharacters

    }

I'm pretty sure something similar could be done in Java 8+, but I've been developing in Kotlin almost exclusively for almost two years now, so it's my go-to language for the JVM.

I've trained myself over the years to avoid function chaining so that one line makes my brain itch a bit. However, I think this implementation is much more expressive and concise than what I wrote in Java.

One of the reasons I like Kotlin is the implicit null safety. In the Kotlin solution, I didn't have to protect the function from a null argument value. It makes the solution more concise and easier to parse mentally. If I wanted to allow null as a value, I'd have to add special syntax to support it. That extra step always makes me stop and think is there a better way than allowing a null input and I can typically find a better way.

Using the stream features in my Kotlin solution reminded me that I wanted to learn more about Functional Programming. Code Wars offered a Haskell version of the kata, so I decided to give that a shot.

Unfortunately, I don't know anything about Haskell. Not even how to spell it. I always want to have just one L.

Fortunately, I listened to a lot of the early episodes of The Bikeshed and had heard Sean and Derek mention Miran Lipovaca's Learn You A Haskell website many times.

I started going through the content and found it very accessible. It's a good balance of explanation and examples. Pretty soon I was taking the individual topics and combining them to solve new problems.

For example, during the section on list operations, I was trying to write a function that would remove prune one list from another. That is, given lists A and B return A with everything in B removed. I couldn't find a solution using just the set operators, but once I got to the section on list operations a light bulb went of. After some trial and error, I came up with this:


    filterOut s f  = [ x | x <- s, not (elem x f) ]

For example:


ghci> filterOut ['A'..'Z'] ['I','P']
"ABCDEFGHJKLMNOQRSTUVWXYZ"

And, yes. Deep down inside I still find 3rd grader humor potty funny.

I still don't know enough to solve the Isogram problem in Haskell, but I'm getting close.

Friday, October 18, 2019

The Dubstep Kata

The first kata I trained with on codewars.com was the dubstep kata. To summarize, a DJ creates the dubstep version of a popular song by inserting 1 or more WUBs between any word in the song, or 0 or more before or after the lyric. And because dubstep is always up-tempo, the DJ also removes all the spaces between words and capitalizes all the characters. So, the opening line of Hotel California could be:

ONWUBAWUBWUBDARKWUBWUBDESERTWUBHIGHWAYWUBWUB

but not

WUBONAWUBDARKWUBDESERTWUBHIGHWAY

(there's no WUB between "ON" and "A")
The exercise was to extract the original lyric from the DJ's dubstep version.
My first stab at this was very mechanical. Walk the string looking for "WUB" and then throwing them away. It looked like this:


    public static String SongDecoderAlpha (String song) {

        final StringBuilder builder =  new StringBuilder();

        while (song.length() > 0) {
            if (song.startsWith(WUB)) {
                song = song.substring(3);
            } else {
                int i = song.indexOf(WUB);
                if (i >= 0) {
                    String temp = song.substring(0, i);
                    appendWord(builder, temp);
                    song = song.substring(i);
                } else {
                    appendWord(builder, song);
                    break;
                }
            }
        }

        return builder.toString();

    }

    private static void appendWord(final StringBuilder builder, 
                                   final String temp) {
        if (builder.length() > 0) {
            builder.append(" ");
        }
        builder.append(temp);
    }

I wasn't really excited about this. It did what it needed to do, passed all of the tests, but it is very complex. A while loop with nested booleans. I think it wouldn't be too hard to figure out what it does, but it wouldn't be easy to understand all the permutations. Plus it has the additional appendWord() function, reducing duplicated code and complexity, but also adding another chunk of code to understand. So, I decided to come up with another solution.

    public static String SongDecoderBeta (String song) {

        song = song.replace(WUB, " ");
        song = song.trim();
        while (song.contains("  ")) {
            song = song.replace("  ", " ");
        }

        return song;

    }

This one was better. There are fewer lines of code and no if statements. There is literally almost nothing to it. It also more clearly describes what the solution is:

replace all the WUB tokens with spaces
clean up all the extra spaces

This was ultimately the solution I submitted, though I wasn't really happy with it. The while loop replacing the double spaces with a single space didn't quite sit right with me. After submitting my solution, I looked at what other folks had submitted and I saw that it was possible to use greedy REGEX expressions to do this more concisely. As a rule of thumb, I avoid using regular expressions because they can be hard to understand. In this case, the expression is so simple that the risk of confusion is minimal.
So, a revised implementation would look like this:

    public static String SongDecoder (String song)
    {

        song = song.replace(WUB, " ");
        song = song.trim();
        song = song.replaceAll(" +", " ");

        return song;

    }

Three lines of code and the only thing that is not direct is what the regex is doing, but that's easy enough to look up. I'm not sure which one I'd use in production code.

Thinking about how this solution worked, I realized that the code was extracting the original lyrics by parsing out the WUBs. I decided to experiment with this approach to see if it was more understandable. The result was this:

    public static String SongDecoder(String song)
    {

        final String[] words = song.split(WUB);
        final StringBuilder stringBuilder = new StringBuilder();

        String word;
        for (int i = 0; i < words.length; i++) {
            word = words[i];
            if (word.length() == 0) {
                continue;
            }
            if (stringBuilder.length() > 0) {
                stringBuilder.append(" ");
            }
            stringBuilder.append(word);
        }

        return stringBuilder.toString();

    }

This is better than the original in that it is self-descriptive of the approach. The one thing I hadn't considered was the behavior of String.split() when there consecutive delimiters. For example, the input RWUBWUBWUBLWUB resulted in the following tokens:

[R]
[]
[]
[L]

Not sure if this the defined behavior of split() or not; I didn't dig into it. But that meant I had to add another if statement to handle those empty string tokens. I do like this approach because the code to add a space between words only existed in one branch. This eliminated the need to extract it into another method.

Looking at the lines of code and number of branches, it's easy enough to see which solution is the simplest. But I wanted to quantify the differences. I found the website lizard.ws which offers online complexity analysis of 8 languages, include Java. I plugged in each version of my solution and got the following results:

Solution	NLoC	Complexity	Tokens
mechanical	25	6 (4+2)	160
replace	9	2	51
greedy replace	7	1	40
split	17	4	104

NLoC is Noncommented Lines of Code: anything that isn't a blank line or a comment line.
I think that makes it clear: The greedy replace solution has the lowest possible complexity score and one-third the lines of code and tokens of my original solution. By removing all that other code for your brain to parse, I'm comfortable you'd have the overhead to ponder the regex.

So I changed direction already

A co-worker told me recently that he was similarly decided to start doing practice exercises on codewars.com to expand his skills.

I hadn't heard of the website, so I checked it out briefly. I completed one kata in Java to see how it worked. I liked that the site supports solutions to katas in multiple languages, which would help me with the goal of expanding my fluency. I also liked that, at least for Java, unit tests are a part of the process, example tests are provided and that I could add my own tests to the suite.

So, I'm going to give this a shot and see what I can get out of it. I'm also going to do a write up each exercise here to explain my solution and my thought process.

Wednesday, October 16, 2019

Trying to start over

I decided to start refining my skills and branching out more. It feels like most of my time at work is spent sharing my knowledge and experience with others, making high-level architecture or wading into application process monitoring concerns. All of these are things I genuinely enjoy, but I feel like I spend so little with my hands in the code that I'm losing my edge.

I feel sort of like.. a... lumberjack! Leaping from tree to tree as they float down the mighty rivers of British Columbia. Meeting to meeting, conversation to conversation, setting strategy, keeping an eye out for log jams, but ultimately at the mercy of the direction the river wants to go. Or I'm tending a banzai, helping to nurture and shape my teammates' abilities. I really like both of the analogies, but I can't think of something in the middle.

To correct this, I'm going back to the basics and start doing katas and similar exercises. I've decided to start with Dave Thomas's codekata.com. Part of this will be a chance to learn more about Kotlin and maybe experiment with other programming languages such as go and Haskell. I'll be checking in my code here in GitHub if you want to see what I'm working on.