We're into double figures! This challenge was /u/jordanreiter's idea; the challenge is to turn a sentence into a number of "tokens" that would be suitable for something like a search engine to use a keywords. For example, "don't tell Suzie Smith-Hopper that I broke Daniel's toy horse" would be turned into "don't,tell,Suzie,Smith-Hopper,that,I,broke,Daniel's,toy,horse" and "other "big name" items" would be turned into "other,big name, items".
The four criteria are as follows:
If words are in quotes, treat them as a single separate token: eg "This "huge test" is pointless" would be changed to "this,huge test,is,pointless". This applies to both single quotes and double quotes.
Hyphenated last names (such as "Smith-Hopper") should be a single token, but words with more hyphens, or hyphens at the beginning or end of the word, should have the hyphens stripped and be treated as separate tokens: "Suzie Smith-Hopper test--hyphens" should be changed to "Suzie,Smith-Hopper,test,hyphens".
Contractions should be treated as a single token; "I can't do it" would be changed to "I,can't,do,it".
Punctuation should be removed (but not hyphens and quotes as above); "Too long; didn't read" would turn into "Too,long,didn't,read".
This challenge is challenging! It is certainly possible, though.
To test a regular expression on the test cases below, type it into the text input. Each test case will be marked as passed or failed respectively - you are aiming to get as many test cases as you can to pass. Note that JavaScript must be enabled for this feature to work. The regex engine used is the JavaScript regex engine; it is similar to PCRE, but with a few differences.