Punctuation replacement using regular expression
Replace all the punctuations from a string by “space + punctuation + space” using Pattern Matcher – Regular expression in java.
Input string:
“Article: The Journal of clinical endocrinology and metabolism Endometrial and pituitary responses to the steroidal antiprogestin RU 486 in postmenopausal women. The effects of the antiprogestin RU 486 on the human endometrium were investigated. Seventeen postmenopausal women were injected with estradiol (E2) benzoate (0.625 mg/day) for 15 days. Progesterone (P) (25 mg/day) and/or RU 486 (100 or 200 mg/day) were given to groups of 2-3 women during the last 6 days of E2 benzoate treatment. Serial blood samples were drawn for the measurement of plasma E2, P, and LH and FSH.”
From the above text we need to replace following characters:
- “:” replace with ” : “
- “(” replace with ” ( “
- “)” replace with ” ) “
- “/” replace with ” / “
- “(” replace with ” ( “
- “-” replace with ” – “
- “,” replace with ” , “
Main method
public static void main(String[] args) {
String text = "Article: The Journal of clinical endocrinology and metabolism Endometrial and " +
"pituitary responses to the steroidal antiprogestin RU 486 in postmenopausal women. " +
"The effects of the antiprogestin RU 486 on the human endometrium were investigated. " +
"Seventeen postmenopausal women were injected with estradiol (E2) benzoate (0.625 mg/day) for 15 days. " +
"Progesterone (P) (25 mg/day) and/or RU 486 (100 or 200 mg/day) were given to groups of 2-3 women during " +
"the last 6 days of E2 benzoate treatment. Serial blood samples were drawn for the " +
"measurement of plasma E2, P, and LH and FSH.";
try {
text = parseText(text);
System.out.println(text);
} catch (Exception e) {
e.printStackTrace();
}
}
Parse text method:
We parse the text using pattern matcher classes and replacing the “matcher group” with “space matcher group space” i.e. if matcher group is “/” then we replace it with “ / ”.
public static String parseText(String text) {
String temp = "";
Pattern pattern = Pattern.compile("[\\p{P}\\p{S}]");
try {
if (text != null && !text.isEmpty()) {
Matcher matcher = pattern.matcher(text);
if (matcher.find()) {
// Skip for dot
if (matcher.group().equals(".")) {
temp = temp + text.substring(0, matcher.start()) +
matcher.group() + "";
} else {
temp = temp + text.substring(0, matcher.start()) + " " +
matcher.group() + " ";
}
temp = temp + parseText(text.substring(matcher.end()));
} else {
temp = temp + text;
}
}
} catch (Exception e) {
e.printStackTrace();
}
temp = temp.replaceAll("\\s+", " ");
if (temp.isEmpty()) {
return text;
}
return temp;
}
Output string:
“Article : The Journal of clinical endocrinology and metabolism Endometrial and pituitary responses to the steroidal antiprogestin RU 486 in postmenopausal women. The effects of the antiprogestin RU 486 on the human endometrium were investigated. Seventeen postmenopausal women were injected with estradiol ( E2 ) benzoate ( 0.625 mg / day ) for 15 days. Progesterone ( P ) ( 25 mg / day ) and / or RU 486 ( 100 or 200 mg / day ) were given to groups of 2 – 3 women during the last 6 days of E2 benzoate treatment. Serial blood samples were drawn for the measurement of plasma E2 , P , and LH and FSH.”
Note: We can replace that matcher group with any other string.