Thursday, February 13, 2014

Quick & Dirty Tips: Splitting words

First thing to note is a StringTokenzier is depreciated.
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
Instead use String.split(String regex), which takes regular expression. So to use the method we have to look at how regular express can split words.

[^abc] Any character except a, b, or c (negation)
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]

So to split words in a String, I use the following:

String line = "Lorem ipsum dolor sit amet, consectetur adipisicing elit...";
String[] words = line.split("\\s*[^a-zA-Z]+\\s*");


Links:

No comments:

Post a Comment