Surratt On Software: The Dubstep Kata

The first kata I trained with on codewars.com was the dubstep kata. To summarize, a DJ creates the dubstep version of a popular song by inserting 1 or more WUBs between any word in the song, or 0 or more before or after the lyric. And because dubstep is always up-tempo, the DJ also removes all the spaces between words and capitalizes all the characters. So, the opening line of Hotel California could be:

ONWUBAWUBWUBDARKWUBWUBDESERTWUBHIGHWAYWUBWUB

but not

WUBONAWUBDARKWUBDESERTWUBHIGHWAY

(there's no WUB between "ON" and "A")
The exercise was to extract the original lyric from the DJ's dubstep version.
My first stab at this was very mechanical. Walk the string looking for "WUB" and then throwing them away. It looked like this:


    public static String SongDecoderAlpha (String song) {

        final StringBuilder builder =  new StringBuilder();

        while (song.length() > 0) {
            if (song.startsWith(WUB)) {
                song = song.substring(3);
            } else {
                int i = song.indexOf(WUB);
                if (i >= 0) {
                    String temp = song.substring(0, i);
                    appendWord(builder, temp);
                    song = song.substring(i);
                } else {
                    appendWord(builder, song);
                    break;
                }
            }
        }

        return builder.toString();

    }

    private static void appendWord(final StringBuilder builder, 
                                   final String temp) {
        if (builder.length() > 0) {
            builder.append(" ");
        }
        builder.append(temp);
    }

I wasn't really excited about this. It did what it needed to do, passed all of the tests, but it is very complex. A while loop with nested booleans. I think it wouldn't be too hard to figure out what it does, but it wouldn't be easy to understand all the permutations. Plus it has the additional appendWord() function, reducing duplicated code and complexity, but also adding another chunk of code to understand. So, I decided to come up with another solution.

    public static String SongDecoderBeta (String song) {

        song = song.replace(WUB, " ");
        song = song.trim();
        while (song.contains("  ")) {
            song = song.replace("  ", " ");
        }

        return song;

    }

This one was better. There are fewer lines of code and no if statements. There is literally almost nothing to it. It also more clearly describes what the solution is:

replace all the WUB tokens with spaces
clean up all the extra spaces

This was ultimately the solution I submitted, though I wasn't really happy with it. The while loop replacing the double spaces with a single space didn't quite sit right with me. After submitting my solution, I looked at what other folks had submitted and I saw that it was possible to use greedy REGEX expressions to do this more concisely. As a rule of thumb, I avoid using regular expressions because they can be hard to understand. In this case, the expression is so simple that the risk of confusion is minimal.
So, a revised implementation would look like this:

    public static String SongDecoder (String song)
    {

        song = song.replace(WUB, " ");
        song = song.trim();
        song = song.replaceAll(" +", " ");

        return song;

    }

Three lines of code and the only thing that is not direct is what the regex is doing, but that's easy enough to look up. I'm not sure which one I'd use in production code.

Thinking about how this solution worked, I realized that the code was extracting the original lyrics by parsing out the WUBs. I decided to experiment with this approach to see if it was more understandable. The result was this:

    public static String SongDecoder(String song)
    {

        final String[] words = song.split(WUB);
        final StringBuilder stringBuilder = new StringBuilder();

        String word;
        for (int i = 0; i < words.length; i++) {
            word = words[i];
            if (word.length() == 0) {
                continue;
            }
            if (stringBuilder.length() > 0) {
                stringBuilder.append(" ");
            }
            stringBuilder.append(word);
        }

        return stringBuilder.toString();

    }

This is better than the original in that it is self-descriptive of the approach. The one thing I hadn't considered was the behavior of String.split() when there consecutive delimiters. For example, the input RWUBWUBWUBLWUB resulted in the following tokens:

[R]
[]
[]
[L]

Not sure if this the defined behavior of split() or not; I didn't dig into it. But that meant I had to add another if statement to handle those empty string tokens. I do like this approach because the code to add a space between words only existed in one branch. This eliminated the need to extract it into another method.

Looking at the lines of code and number of branches, it's easy enough to see which solution is the simplest. But I wanted to quantify the differences. I found the website lizard.ws which offers online complexity analysis of 8 languages, include Java. I plugged in each version of my solution and got the following results:

Solution	NLoC	Complexity	Tokens
mechanical	25	6 (4+2)	160
replace	9	2	51
greedy replace	7	1	40
split	17	4	104

NLoC is Noncommented Lines of Code: anything that isn't a blank line or a comment line.
I think that makes it clear: The greedy replace solution has the lowest possible complexity score and one-third the lines of code and tokens of my original solution. By removing all that other code for your brain to parse, I'm comfortable you'd have the overhead to ponder the regex.

Surratt On Software

Friday, October 18, 2019

The Dubstep Kata

No comments:

Post a Comment