Punctuation replacement using regular expression


By: Sagar Gole | July 10, 2015

Replace all the punctuations from a string by “space + punctuation + space” using Pattern Matcher – Regular expression in java.

Input string:

“Article: The Journal of clinical endocrinology and metabolism Endometrial and pituitary responses to the steroidal antiprogestin RU 486 in postmenopausal women. The effects of the antiprogestin RU 486 on the human endometrium were investigated. Seventeen postmenopausal women were injected with estradiol (E2) benzoate (0.625 mg/day) for 15 days. Progesterone (P) (25 mg/day) and/or RU 486 (100 or 200 mg/day) were given to groups of 2-3 women during the last 6 days of E2 benzoate treatment. Serial blood samples were drawn for the measurement of plasma E2, P, and LH and FSH.”

From the above text we need to replace following characters:

  1. “:” replace with ” : “
  2. “(” replace with ” ( “
  3. “)” replace with ” ) “
  4. “/” replace with ” / “
  5. “(” replace with ” ( “
  6. “-” replace with ” – “
  7. “,” replace with ” , “

Main method

public static void main(String[] args) {
    	String text = "Article: The Journal of clinical endocrinology and metabolism Endometrial and "
				+ "pituitary responses to the steroidal antiprogestin RU 486 in postmenopausal women. "
				+ "The effects of the antiprogestin RU 486 on the human endometrium were investigated. "
				+ "Seventeen postmenopausal women were injected with estradiol (E2) benzoate (0.625 mg/day) for 15 days. "
				+ "Progesterone (P) (25 mg/day) and/or RU 486 (100 or 200 mg/day) were given to groups of 2-3 women during "
				+ "the last 6 days of E2 benzoate treatment. Serial blood samples were drawn for the "
				+ "measurement of plasma E2, P, and LH and FSH.";
		try {
			text = parseText(text);
			System.out.println(text);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

Parse text method:

We parse the text using pattern matcher classes and replacing the “matcher group” with “space matcher group space” i.e. if matcher group is “/” then we replace it with “ / ”.

public static String parseText(String text) {
    String temp = "";
	Pattern pattern = Pattern.compile("[\\p{P}\\p{S}]");
	try {
		if (text != null && !text.isEmpty()) {
			Matcher matcher = pattern.matcher(text);
			if (matcher.find()) {
                          // Skip for dot
				if (matcher.group().equals(".")) {
					temp = temp + text.substring(0, matcher.start())
							+ matcher.group() + "";
				} else {
					temp = temp + text.substring(0, matcher.start()) + " "
							+ matcher.group() + " ";
				}
				temp = temp + parseText(text.substring(matcher.end()));
			} else {
				temp = temp + text;
			}
		}
	} catch (Exception e) {
		e.printStackTrace();
	}
	temp = temp.replaceAll("\\s+", " ");
	if (temp.isEmpty()) {
		return text;
	}
	return temp;
}

Output string:

“Article : The Journal of clinical endocrinology and metabolism Endometrial and pituitary responses to the steroidal antiprogestin RU 486 in postmenopausal women. The effects of the antiprogestin RU 486 on the human endometrium were investigated. Seventeen postmenopausal women were injected with estradiol ( E2 ) benzoate ( 0.625 mg / day ) for 15 days. Progesterone ( P ) ( 25 mg / day ) and / or RU 486 ( 100 or 200 mg / day ) were given to groups of 23 women during the last 6 days of E2 benzoate treatment. Serial blood samples were drawn for the measurement of plasma E2 , P , and LH and FSH.”

Note: We can replace that matcher group with any other string.

This post has been viewed 1,706 times

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>