Thursday, July 30, 2009

List of metacharacters in regular expression(regexp)

Most basic regular expression consists of list of metacharacters. These metacharacters have a special meaning in regular expression. In this post I will show you the list of eleven metacharacters exist in most regular expression and their meaning.

1)Dot .: Dot matches any single character except a newline. For example, if it is said a.b then it can indicates aab or axb or avb or a%b etc.

But remember that if you use dot (.) within square bracket then it becomes literal. If we use [a.b] then it describes only "a" or "." or "b".

2)Bracket[]: Bracket matches a set of possible character contains inside it. For example if we write [abcde] then it describes "a" or "b" or "c" or "d" or "e". Within bracket we can also specify a range of characters using dash (-). Suppose, [a-z] specifies a range which matches any lowercase letter from "a" to "z". We can use it in various ways. Like [mp-s] indicates "m" or "p" or "q" or "r" or "s", also we can indicate same expression by, [mpq-s].

But remember that if dash(-) is specified as last character inside bracket or specified after backslash then it becomes literals. Suppose [mnq-] expresses "m" or "n" or "q" or "-". [mp\-s] denotes "m" or "p" or "-" or "s".

3)Caret within bracket [^]: Caret within bracket just do negation. If we specify caret inside bracket then it matches all characters that is not inside bracket. For example if we write [^bcd] then it matches any characters other than "b","c","d". If we write [^m-p] then it matches any single character that is not matches any lowercase character between m to p., i.e not m, n, o, p.

4)Caret ^: Caret (^) means it matches beginning of a line or beginning of a string.

5)Dollar sign $: Dollar sign ($) means it matches ending position of a line or ending position of a string.

6)Plus sign +: Plus sign (+) matches the preceding pattern element one or more times. For example if we write, xy+z then it matches xyz, xyyz, xyyyz etc.

7)question mark ?: Question mark (?) matches the preceding pattern element zero or one time.
For example if we write xy?z then it matches only xz and xyz.

8)Asterisk sign *: Asterisk (*) matches the preceding pattern element zero or more times.
For example if we write xy*z then it matches xz, xyz, xyyz etc.

9)vertical bar or pipe symbol |: It matches the expression before vertical bar or matches the expression after vertical bar. If we write a|b then it matches either "a" or "b".

10){M,N}: It expresses the minimum M and the maximum N match count of the preceding character. For example if we write a{M,N} "a" matches minimum M times and maximum N times.

11)\b: Matches a word boundary. For example, \b upon word "test" matches st, t, est etc.

12)\w: Matches an alphanumeric character, including "_". For example, \w matches [a-zA-Z0-9_].

13)\W: Matches a non-alphanumeric character, excluding "_". So, \W matches [^a-zA-Z0-9_].

14)\s: Matches a whitespace character (space, tab, newline, form feed). So, \s matches space, tab, newline, form feed.

15)\S: Matches anything except a whitespace.

16)\d: Matches a digit. We can also donate as [0-9].

19)\D: Matches a non-digit. We can also donate as [^0-9].

Related Documents
http://arjudba.blogspot.com/2009/07/what-is-regular-expression-or-regex-or.html
http://arjudba.blogspot.com/2009/06/how-to-add-word-or-letter-at-end-of.html
http://arjudba.blogspot.com/2009/06/how-to-add-line-to-first-in-file-using.html

No comments:

Post a Comment