|
楼主 |
发表于 2008-4-1 14:51:19
|
显示全部楼层
怎么样写L7的正规表达式:
L7-filter Pattern Writing HOWTO It's fairly easy to add support for more protocols to l7-filter. Allyou need to do is add a new pattern file to/etc/l7-protocols. This directory and its subdirectoriesare searched (non-recursively) for pattern files. (Thus, it will find/etc/l7-protocols/http.pat and/etc/l7-protocols/protocols/http.pat, but not/etc/l7-protocols/foo/bar/http.pat.) Please considersubmitting any patterns you write for inclusion into the officialdistribution.
File formatBasic formatThe basic format is very simple:
- The name of the protocol on one line
- A regular expression defining the protocol on the next line (see regular expressions below)
The name of the file must match the name of the protocol. (If theprotocol is "ftp", the file must be "ftp.pat".) Lines starting with '#'and blank lines are ignored. Both the kerneland userspace versions of l7-filter willuse the given regular expression. For example, vnc.pat could be:
vnc
^rfb 00[1-9]\.00[0-9]\x0a$
Defining a separate userspace patternSometimes it will be desirable to define a separate regularexpression for the kernel and userspace versions or to pass a custom setof flags to the userspace version's regcomp/regexec. (See regular expressions below for why.) In this case, addeither or both of these lines after the two above:
userspace pattern=<userspace pattern>
userspace flags=<regexec and/or regcomp flags, whitespace delimited>
For example, smtp.pat could be:
smtp
^220[\x09-\x0d -~]* (e?smtp|simple mail)
userspace pattern=^220[\x09-\x0d -~]* (E?SMTP|[Ss]imple [Mm]ail)
userspace flags=REG_NOSUB REG_EXTENDED
Meta-dataPattern files that are part of the official distribution need somemetadata at the top for display on the webpageand for the use of frontends. The top four lines should look likethis:
# <rotocol name and some concise detail about the protocol>
# Pattern attributes: [attribute word]*
# Protocol groups: [group name]*
# Wiki: [link]*
"attern attributes" give information about how good the pattern ison various scales. Attribute words can be any of undermatch,overmatch, superset, subset, great,good, ok, marginal, poor, veryfast,fast, nosofast, or slow. Any number of these maybe used. They are defined on the protocolspage.
"rotocol groups" are supposed to give frontends a way to groupsimilar protocols. Group names can be whatever you like, but shouldmatch existing names if possible. Any number may be used. Morerelevant groups should be listed first for sorting purposes. Group namesin use as of 2007-01-14 are:
- chat
- document_retrieval
- file
- game
- ietf_draft_standard
- ietf_internet_standard
- ietf_proposed_standard
- ietf_rfc_documented
- mail
- monitoring
- networking
- obsolete
- open_source
- p2p
- printer
- proprietary
- remote_access
- secure
- streaming_audio
- streaming_video
- time_synchronization
- version_control
- voip
- worm
- x_consortium_standard
"Wiki" gives zero or more links to pagesdocumenting the pattern and other methods of identifying the protocol onprotocolinfo.org.
Regular expressionsThe kernel and userspace versions of l7-filter use differentregular expressions libraries. They use generally the same syntax, but have some differences.
General informationBecause patterns frequently need to use non-printable characters,both versions of l7-filter add perl-stylehex matching on top of their stock libraries. This uses \xHHnotation, so to match a tab, use "\x09". Note that regexpcontrol characters are still control characters evenwhen written in hex:
\x24 == $ \x28 == (
\x29 == ) \x2a == *
\x2b == + \x2e == .
\x3f == ? \x5b == [
\x5c == \ \x5d == ]
\x5e == ^ \x7b == { (only a control character for the userspace version)
\x7c == | \x7d == } (only a control character for the userspace version)
Both versions of l7-filter strip out the nulls (\x00 bytes) fromnetwork data so that they can treat it as normal C strings. So (1) youcan't match on nulls and (2) fields may appear shorter than expected. For example, if a protocol has a 4 byte field and any of those bytes canbe null, it can appear to be any length from 0 to 4.
Kernel versionThe kernel version of l7-filter uses Henry Spencer's 1987implementation of Version 8 regularexpressions ("V8 regexps"), with a few modifications, noted here. V8 regexps are likely more limited than the regexps you are used to.Notably, you cannot use bounds ("foo{3}"),character classes ("[[:punct:]]") or backreferences.
Because this library does not have a flag for case-sensitivity, thekernel version of l7-filter is always case insensitive. Upper case inpatterns is identical to lower case. (This is true even if you write anuppercase letter in hex!)
The kernel version completely ignores any lines in the pattern fileafter the second non-comment line.
Userspace versionThe userspace version of l7-filter uses the GNU regular expression library, so its behaviour should bemore familiar. This library is documented in man 3 regcomp andman 7 regex.
If only one regular expression is specified in the pattern file (seefile format above), the userspace versioncompiles it with the flags REG_EXTENDED | REG_ICASE |REG_NOSUB and executes it with no flags.
If the userspace pattern and userspaceflags lines are given, the userspace pattern will be used insteadof the first one. It will be compiled and executed with the given flags.(l7-filter will sort out which flags go to regcomp and which toregexec.)
If only the userspace pattern line is given, theuserspace pattern will be compiled with REG_EXTENDED | REG_ICASE |REG_NOSUB and executed with no flags. If only theuserspace flags line is given, the single regularexpression will be compiled and executed with the given flags.
What l7-filter sees and doesIf you have set up your iptables rules correctly (see the HOWTO), l7-filter sees the data going in bothdirections in the order that it passes through the computer. Forinstance, in FTP, the firstthing it sees is "221 server ready", then "USER bob", then "331 sendpassword", then "PASS frogbeard", and so on.
l7-filter can match across packets. For instance, with the above FTPexample, the match is first attempted on "221 server ready", then on"221 server readyUser bob", then "221 server readyUSER bob331 sendpassword",[1] so you could match it with"220.*user.*331". At each match attempt, the regexpspecial character ^ will match the beginning of the streamand $ will match the end of the last packet seen so far. Because the Linux kernel's ip_conntrack module tracks connectionlessUDP and ICMP sessions as"connections", this works with them as well as TCP.
Usually the identifying characteristics of a connection are found atthe beginning of that connection. For this reason, and to saveprocessing time, l7-filter only looks at the first 10 packets or 2kB of each connection, whichever is smaller. Any match made within this time is applied to the rest of the connectionas well.
1Yes, there should be CRLFs in there. Picky, picky.
What makes a good patternThere are two general guidelines:
1) A pattern must be neither too specific nor not specific enough.
Example 1: The pattern "bear" for Bearshare is notspecific enough. This pattern could match a wide variety ofnon-Bearshare connections. For instance, an HTTP request for http://bear.com would bematched.
Example 2: "220 .*ftp.*(\[.*\]|\(.*\))" for FTP is toospecific. Not all servers send ()s or []s after their 220. In fact,servers are not even required to send the string "ftp" at any time, butthe vast majority do. Good judgement and testing are necessary forinstances such as this.
2) It should use a minimum of processing power. If it's possible toreduce the number of instances of *, + and| in your pattern, you should do so. Use the performancetesting program included in the patterns package.
3) It should complete its match on the earliest packet possible. TheFTP pattern could be "^220[\x09-\x0d -~]*\x0d\x0aUSER[\x09-\x0d-~]*\x0d\x0a331", but that won't match until the third datapacket. Instead, we use "^220[\x09-\x0d -~]*ftp", whichmatches on the first data packet.
Miscellaneous tips[\x09-\x0d -~] == printable characters, including whitespace
[\x09-\x0d ] == any whitespace
[!-~] == non-whitespace printable characters
Recommended procedure for writing patterns- Find and read the spec for the protocol you wish to match. If it'san Internet standard, RFCs are agood place to start, although not all standards are RFCs. If it is aproprietary protocol, it is likely that someone has written areverse-engineered spec for it. Do a general web search to find it. Skipping this step is a good way to write patterns that are overlyspecific!
- Use something like Wireshark(formerly known as Ethereal) to watch packets of this protocol go by ina typical session of its use. (If you failed to find a spec for yourprotocol, but Wireshark can parse it, reading the Wireshark source codemay also be worth your time.)
- Write a pattern that will reliably match one of the first few packetsthat are sent in your protocol. Test it. Test its performance.
- Send your pattern to l7-filter-developers{/-\T}lists*sf*net for itto be incorporated into the official pattern definitions (youmust subscribefirst).
HOWTO send a packet dump to the mailing listIf you do not feel that you are able to do all of the above yourself,you may want to send some packets you have captured to the mailing listso that others can do the rest. In order for this to be useful, pleasefollow these guidelines:
- If you have never done anything like this before, use Wireshark. It's easy to use andavailable for GNU/Linux, Mac and Windows (and FreeBSD, HP-UX, NetBSD, Solaris...). Use File→Save to save thecaptured packets.
- Make sure that you start capturing packets before the applicationthat you are testing has started using the network. l7-filter looks atthe opening packets of a connection. If these are not present in thepacket dump, it is useless.
- If it makes sense for the protocol in question, send a recognizable text string so that the relevant connection can be found in the packetdump. For instance, if testing an instant messenger, send a messagewith "hello hello hello."
- Along with your capture, send us anything that could be helpful inpicking out the relevant data. For example, this could include theserver's IP address, what networkoperations you performed, the version numbers of all software used, anystrings you expect to appear in the packets (such as instant messengertext, e-mail addresses, gaming handles, etc.), etc.
- Try not to capture an excessive number of packets. In particular:
- Avoid having other programs use the network during your capture.Assuming their traffic is recognizable, the excess packets can befiltered out, but it's annoying.
- Avoid sending captures that have many thousands of packets from thesame connection. All but the first few are useless.
- However, if you are not sure when the applicationopens connections, or if it opens many simultaneous connections, itmight be necessary to send a large number of packets. This is ok.
- Send the packets in libpcap format or something else that Wireshark can read. Do not:
- send only a text hexdump of the packets. This is unnecessarily hard to read.
- send only the data portion of the packets. The TCP headers in particular are essential for finding streams. You may anonymize addresses if necessary, but try to avoid it.
- compress the captured packets with anything other than gzip or bzip2. No compression is needed unless the file is very large.
If you aren't sure how to follow these guidelines, try your best andsend the result to us. If it's wrong, we'll be happy to tell you how tofix it.
[ 本帖最后由 wbyz20 于 2008-4-1 14:54 编辑 ] |
|