[][src]Struct aho_corasick::AhoCorasick

pub struct AhoCorasick<S: StateID = usize> { /* fields omitted */ }

An automaton for searching multiple strings in linear time.

The AhoCorasick type supports a few basic ways of constructing an automaton, including AhoCorasick::new and AhoCorasick::new_auto_configured. However, there are a fair number of configurable options that can be set by using AhoCorasickBuilder instead. Such options include, but are not limited to, how matches are determined, simple case insensitivity, whether to use a DFA or not and various knobs for controlling the space-vs-time trade offs taken when building the automaton.

If you aren't sure where to start, try beginning with AhoCorasick::new_auto_configured.

Resource usage

Aho-Corasick automatons are always constructed in O(p) time, where p is the combined length of all patterns being searched. With that said, building an automaton can be fairly costly because of high constant factors, particularly when enabling the DFA option (which is disabled by default). For this reason, it's generally a good idea to build an automaton once and reuse it as much as possible.

Aho-Corasick automatons can also use a fair bit of memory. To get a concrete idea of how much memory is being used, try using the AhoCorasick::heap_bytes method.

Examples

This example shows how to search for occurrences of multiple patterns simultaneously in a case insensitive fashion. Each match includes the pattern that matched along with the byte offsets of the match.

use aho_corasick::AhoCorasickBuilder;

let patterns = &["apple", "maple", "snapple"];
let haystack = "Nobody likes maple in their apple flavored Snapple.";

let ac = AhoCorasickBuilder::new()
    .ascii_case_insensitive(true)
    .build(patterns);
let mut matches = vec![];
for mat in ac.find_iter(haystack) {
    matches.push((mat.pattern(), mat.start(), mat.end()));
}
assert_eq!(matches, vec![
    (1, 13, 18),
    (0, 28, 33),
    (2, 43, 50),
]);

This example shows how to replace matches with some other string:

use aho_corasick::AhoCorasick;

let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let replace_with = &["sloth", "grey", "slow"];

let ac = AhoCorasick::new(patterns);
let result = ac.replace_all(haystack, replace_with);
assert_eq!(result, "The slow grey sloth.");

Implementations

impl AhoCorasick[src]

pub fn new<I, P>(patterns: I) -> AhoCorasick where
    I: IntoIterator<Item = P>,
    P: AsRef<[u8]>, 
[src]

Create a new Aho-Corasick automaton using the default configuration.

The default configuration optimizes for less space usage, but at the expense of longer search times. To change the configuration, use AhoCorasickBuilder for fine-grained control, or AhoCorasick::new_auto_configured for automatic configuration if you aren't sure which settings to pick.

This uses the default MatchKind::Standard match semantics, which reports a match as soon as it is found. This corresponds to the standard match semantics supported by textbook descriptions of the Aho-Corasick algorithm.

Examples

Basic usage:

use aho_corasick::AhoCorasick;

let ac = AhoCorasick::new(&[
    "foo", "bar", "baz",
]);
assert_eq!(Some(1), ac.find("xxx bar xxx").map(|m| m.pattern()));

pub fn new_auto_configured<B>(patterns: &[B]) -> AhoCorasick where
    B: AsRef<[u8]>, 
[src]

Build an Aho-Corasick automaton with an automatically determined configuration.

Specifically, this requires a slice of patterns instead of an iterator since the configuration is determined by looking at the patterns before constructing the automaton. The idea here is to balance space and time automatically. That is, when searching a small number of patterns, this will attempt to use the fastest possible configuration since the total space required will be small anyway. As the number of patterns grows, this will fall back to slower configurations that use less space.

If you want auto configuration but with match semantics different from the default MatchKind::Standard, then use AhoCorasickBuilder::auto_configure.

Examples

Basic usage is just like new, except you must provide a slice:

use aho_corasick::AhoCorasick;

let ac = AhoCorasick::new_auto_configured(&[
    "foo", "bar", "baz",
]);
assert_eq!(Some(1), ac.find("xxx bar xxx").map(|m| m.pattern()));

impl<S: StateID> AhoCorasick<S>[src]

pub fn is_match<B: AsRef<[u8]>>(&self, haystack: B) -> bool[src]

Returns true if and only if this automaton matches the haystack at any position.

haystack may be any type that is cheaply convertible to a &[u8]. This includes, but is not limited to, String, &str, Vec<u8>, and &[u8] itself.

Examples

Basic usage:

use aho_corasick::AhoCorasick;

let ac = AhoCorasick::new(&[
    "foo", "bar", "quux", "baz",
]);
assert!(ac.is_match("xxx bar xxx"));
assert!(!ac.is_match("xxx qux xxx"));

pub fn earliest_find<B: AsRef<[u8]>>(&self, haystack: B) -> Option<Match>[src]

Returns the location of the first detected match in haystack.

This method has the same behavior regardless of the MatchKind of this automaton.

haystack may be any type that is cheaply convertible to a &[u8]. This includes, but is not limited to, String, &str, Vec<u8>, and &[u8] itself.

Examples

Basic usage:

use aho_corasick::AhoCorasick;

let ac = AhoCorasick::new(&[
    "abc", "b",
]);
let mat = ac.earliest_find("abcd").expect("should have match");
assert_eq!(1, mat.pattern());
assert_eq!((1, 2), (mat.start(), mat.end()));

pub fn find<B: AsRef<[u8]>>(&self, haystack: B) -> Option<Match>[src]

Returns the location of the first match according to the match semantics that this automaton was constructed with.

When using MatchKind::Standard, this corresponds precisely to the same behavior as earliest_find. Otherwise, match semantics correspond to either leftmost-first or leftmost-longest.

haystack may be any type that is cheaply convertible to a &[u8]. This includes, but is not limited to, String, &str, Vec<u8>, and &[u8] itself.

Examples

Basic usage, with standard semantics:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::Standard) // default, not necessary
    .build(patterns);
let mat = ac.find(haystack).expect("should have a match");
assert_eq!("b", &haystack[mat.start()..mat.end()]);

Now with leftmost-first semantics:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns);
let mat = ac.find(haystack).expect("should have a match");
assert_eq!("abc", &haystack[mat.start()..mat.end()]);

And finally, leftmost-longest semantics:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostLongest)
    .build(patterns);
let mat = ac.find(haystack).expect("should have a match");
assert_eq!("abcd", &haystack[mat.start()..mat.end()]);

pub fn find_iter<'a, 'b, B: ?Sized + AsRef<[u8]>>(
    &'a self,
    haystack: &'b B
) -> FindIter<'a, 'b, S>
[src]

Returns an iterator of non-overlapping matches, using the match semantics that this automaton was constructed with.

haystack may be any type that is cheaply convertible to a &[u8]. This includes, but is not limited to, String, &str, Vec<u8>, and &[u8] itself.

Examples

Basic usage, with standard semantics:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::Standard) // default, not necessary
    .build(patterns);
let matches: Vec<usize> = ac
    .find_iter(haystack)
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![2, 2, 2], matches);

Now with leftmost-first semantics:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns);
let matches: Vec<usize> = ac
    .find_iter(haystack)
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![0, 2, 0], matches);

And finally, leftmost-longest semantics:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostLongest)
    .build(patterns);
let matches: Vec<usize> = ac
    .find_iter(haystack)
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![0, 2, 1], matches);

pub fn find_overlapping_iter<'a, 'b, B: ?Sized + AsRef<[u8]>>(
    &'a self,
    haystack: &'b B
) -> FindOverlappingIter<'a, 'b, S>
[src]

Returns an iterator of overlapping matches in the given haystack.

Overlapping matches can only be detected using MatchKind::Standard semantics. If this automaton was constructed with leftmost semantics, then this method will panic. To determine whether this will panic at runtime, use the AhoCorasick::supports_overlapping method.

haystack may be any type that is cheaply convertible to a &[u8]. This includes, but is not limited to, String, &str, Vec<u8>, and &[u8] itself.

Panics

This panics when AhoCorasick::supports_overlapping returns false. That is, this panics when this automaton's match semantics are not MatchKind::Standard.

Examples

Basic usage, with standard semantics:

use aho_corasick::AhoCorasick;

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::new(patterns);
let matches: Vec<usize> = ac
    .find_overlapping_iter(haystack)
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![2, 0, 2, 2, 0, 1], matches);

pub fn replace_all<B>(&self, haystack: &str, replace_with: &[B]) -> String where
    B: AsRef<str>, 
[src]

Replace all matches with a corresponding value in the replace_with slice given. Matches correspond to the same matches as reported by find_iter.

Replacements are determined by the index of the matching pattern. For example, if the pattern with index 2 is found, then it is replaced by replace_with[2].

Panics

This panics when replace_with.len() does not equal the total number of patterns that are matched by this automaton.

Examples

Basic usage:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns);
let result = ac.replace_all(haystack, &["x", "y", "z"]);
assert_eq!("x the z to the xage", result);

pub fn replace_all_bytes<B>(
    &self,
    haystack: &[u8],
    replace_with: &[B]
) -> Vec<u8> where
    B: AsRef<[u8]>, 
[src]

Replace all matches using raw bytes with a corresponding value in the replace_with slice given. Matches correspond to the same matches as reported by find_iter.

Replacements are determined by the index of the matching pattern. For example, if the pattern with index 2 is found, then it is replaced by replace_with[2].

Panics

This panics when replace_with.len() does not equal the total number of patterns that are matched by this automaton.

Examples

Basic usage:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns);
let result = ac.replace_all_bytes(haystack, &["x", "y", "z"]);
assert_eq!(b"x the z to the xage".to_vec(), result);

pub fn replace_all_with<F>(
    &self,
    haystack: &str,
    dst: &mut String,
    replace_with: F
) where
    F: FnMut(&Match, &str, &mut String) -> bool
[src]

Replace all matches using a closure called on each match. Matches correspond to the same matches as reported by find_iter.

The closure accepts three parameters: the match found, the text of the match and a string buffer with which to write the replaced text (if any). If the closure returns true, then it continues to the next match. If the closure returns false, then searching is stopped.

Examples

Basic usage:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns);
let mut result = String::new();
ac.replace_all_with(haystack, &mut result, |mat, _, dst| {
    dst.push_str(&mat.pattern().to_string());
    true
});
assert_eq!("0 the 2 to the 0age", result);

pub fn replace_all_with_bytes<F>(
    &self,
    haystack: &[u8],
    dst: &mut Vec<u8>,
    replace_with: F
) where
    F: FnMut(&Match, &[u8], &mut Vec<u8>) -> bool
[src]

Replace all matches using raw bytes with a closure called on each match. Matches correspond to the same matches as reported by find_iter.

The closure accepts three parameters: the match found, the text of the match and a byte buffer with which to write the replaced text (if any). If the closure returns true, then it continues to the next match. If the closure returns false, then searching is stopped.

Examples

Basic usage:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostFirst)
    .build(patterns);
let mut result = vec![];
ac.replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
    dst.extend(mat.pattern().to_string().bytes());
    true
});
assert_eq!(b"0 the 2 to the 0age".to_vec(), result);

pub fn stream_find_iter<'a, R: Read>(
    &'a self,
    rdr: R
) -> StreamFindIter<'a, R, S>
[src]

Returns an iterator of non-overlapping matches in the given stream. Matches correspond to the same matches as reported by find_iter.

The matches yielded by this iterator use absolute position offsets in the stream given, where the first byte has index 0. Matches are yieled until the stream is exhausted.

Each item yielded by the iterator is an io::Result<Match>, where an error is yielded if there was a problem reading from the reader given.

When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible.

Searching a stream requires that the automaton was built with MatchKind::Standard semantics. If this automaton was constructed with leftmost semantics, then this method will panic. To determine whether this will panic at runtime, use the AhoCorasick::supports_stream method.

Memory usage

In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.

Panics

This panics when AhoCorasick::supports_stream returns false. That is, this panics when this automaton's match semantics are not MatchKind::Standard. This restriction may be lifted in the future.

Examples

Basic usage:

use aho_corasick::AhoCorasick;

let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";

let ac = AhoCorasick::new(patterns);
let mut matches = vec![];
for result in ac.stream_find_iter(haystack.as_bytes()) {
    let mat = result?;
    matches.push(mat.pattern());
}
assert_eq!(vec![2, 2, 2], matches);

pub fn stream_replace_all<R, W, B>(
    &self,
    rdr: R,
    wtr: W,
    replace_with: &[B]
) -> Result<()> where
    R: Read,
    W: Write,
    B: AsRef<[u8]>, 
[src]

Search for and replace all matches of this automaton in the given reader, and write the replacements to the given writer. Matches correspond to the same matches as reported by find_iter.

Replacements are determined by the index of the matching pattern. For example, if the pattern with index 2 is found, then it is replaced by replace_with[2].

After all matches are replaced, the writer is not flushed.

If there was a problem reading from the given reader or writing to the given writer, then the corresponding io::Error is returned and all replacement is stopped.

When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible. However, callers may want to provide a buffered writer.

Searching a stream requires that the automaton was built with MatchKind::Standard semantics. If this automaton was constructed with leftmost semantics, then this method will panic. To determine whether this will panic at runtime, use the AhoCorasick::supports_stream method.

Memory usage

In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.

Panics

This panics when AhoCorasick::supports_stream returns false. That is, this panics when this automaton's match semantics are not MatchKind::Standard. This restriction may be lifted in the future.

Examples

Basic usage:

use aho_corasick::AhoCorasick;

let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let replace_with = &["sloth", "grey", "slow"];

let ac = AhoCorasick::new(patterns);
let mut result = vec![];
ac.stream_replace_all(haystack.as_bytes(), &mut result, replace_with)?;
assert_eq!(b"The slow grey sloth.".to_vec(), result);

pub fn stream_replace_all_with<R, W, F>(
    &self,
    rdr: R,
    wtr: W,
    replace_with: F
) -> Result<()> where
    R: Read,
    W: Write,
    F: FnMut(&Match, &[u8], &mut W) -> Result<()>, 
[src]

Search the given reader and replace all matches of this automaton using the given closure. The result is written to the given writer. Matches correspond to the same matches as reported by find_iter.

The closure accepts three parameters: the match found, the text of the match and the writer with which to write the replaced text (if any). If the closure returns true, then it continues to the next match. If the closure returns false, then searching is stopped.

After all matches are replaced, the writer is not flushed.

If there was a problem reading from the given reader or writing to the given writer, then the corresponding io::Error is returned and all replacement is stopped.

When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible. However, callers may want to provide a buffered writer.

Searching a stream requires that the automaton was built with MatchKind::Standard semantics. If this automaton was constructed with leftmost semantics, then this method will panic. To determine whether this will panic at runtime, use the AhoCorasick::supports_stream method.

Memory usage

In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.

Panics

This panics when AhoCorasick::supports_stream returns false. That is, this panics when this automaton's match semantics are not MatchKind::Standard. This restriction may be lifted in the future.

Examples

Basic usage:

use std::io::Write;
use aho_corasick::AhoCorasick;

let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";

let ac = AhoCorasick::new(patterns);
let mut result = vec![];
ac.stream_replace_all_with(
    haystack.as_bytes(),
    &mut result,
    |mat, _, wtr| {
        wtr.write_all(mat.pattern().to_string().as_bytes())
    },
)?;
assert_eq!(b"The 2 1 0.".to_vec(), result);

pub fn match_kind(&self) -> &MatchKind[src]

Returns the match kind used by this automaton.

Examples

Basic usage:

use aho_corasick::{AhoCorasick, MatchKind};

let ac = AhoCorasick::new(&[
    "foo", "bar", "quux", "baz",
]);
assert_eq!(&MatchKind::Standard, ac.match_kind());

pub fn max_pattern_len(&self) -> usize[src]

Returns the length of the longest pattern matched by this automaton.

Examples

Basic usage:

use aho_corasick::AhoCorasick;

let ac = AhoCorasick::new(&[
    "foo", "bar", "quux", "baz",
]);
assert_eq!(4, ac.max_pattern_len());

pub fn pattern_count(&self) -> usize[src]

Return the total number of patterns matched by this automaton.

This includes patterns that may never participate in a match. For example, if MatchKind::LeftmostFirst match semantics are used, and the patterns Sam and Samwise were used to build the automaton, then Samwise can never participate in a match because Sam will always take priority.

Examples

Basic usage:

use aho_corasick::AhoCorasick;

let ac = AhoCorasick::new(&[
    "foo", "bar", "baz",
]);
assert_eq!(3, ac.pattern_count());

pub fn supports_overlapping(&self) -> bool[src]

Returns true if and only if this automaton supports reporting overlapping matches.

If this returns false and overlapping matches are requested, then it will result in a panic.

Since leftmost matching is inherently incompatible with overlapping matches, only MatchKind::Standard supports overlapping matches. This is unlikely to change in the future.

Examples

Basic usage:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::Standard)
    .build(&["foo", "bar", "baz"]);
assert!(ac.supports_overlapping());

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostFirst)
    .build(&["foo", "bar", "baz"]);
assert!(!ac.supports_overlapping());

pub fn supports_stream(&self) -> bool[src]

Returns true if and only if this automaton supports stream searching.

If this returns false and stream searching (or replacing) is attempted, then it will result in a panic.

Currently, only MatchKind::Standard supports streaming. This may be expanded in the future.

Examples

Basic usage:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::Standard)
    .build(&["foo", "bar", "baz"]);
assert!(ac.supports_stream());

let ac = AhoCorasickBuilder::new()
    .match_kind(MatchKind::LeftmostFirst)
    .build(&["foo", "bar", "baz"]);
assert!(!ac.supports_stream());

pub fn heap_bytes(&self) -> usize[src]

Returns the approximate total amount of heap used by this automaton, in units of bytes.

Examples

This example shows the difference in heap usage between a few configurations:

This example is not tested
use aho_corasick::{AhoCorasickBuilder, MatchKind};

let ac = AhoCorasickBuilder::new()
    .dfa(false) // default
    .build(&["foo", "bar", "baz"]);
assert_eq!(10_336, ac.heap_bytes());

let ac = AhoCorasickBuilder::new()
    .dfa(false) // default
    .ascii_case_insensitive(true)
    .build(&["foo", "bar", "baz"]);
assert_eq!(10_384, ac.heap_bytes());

let ac = AhoCorasickBuilder::new()
    .dfa(true)
    .byte_classes(false)
    .build(&["foo", "bar", "baz"]);
assert_eq!(20_768, ac.heap_bytes());

let ac = AhoCorasickBuilder::new()
    .dfa(true)
    .byte_classes(true) // default
    .build(&["foo", "bar", "baz"]);
assert_eq!(1_248, ac.heap_bytes());

let ac = AhoCorasickBuilder::new()
    .dfa(true)
    .ascii_case_insensitive(true)
    .build(&["foo", "bar", "baz"]);
assert_eq!(1_248, ac.heap_bytes());

Trait Implementations

impl<S: Clone + StateID> Clone for AhoCorasick<S>[src]

impl<S: Debug + StateID> Debug for AhoCorasick<S>[src]

Auto Trait Implementations

impl<S> RefUnwindSafe for AhoCorasick<S> where
    S: RefUnwindSafe

impl<S> Send for AhoCorasick<S> where
    S: Send

impl<S> Sync for AhoCorasick<S> where
    S: Sync

impl<S> Unpin for AhoCorasick<S> where
    S: Unpin

impl<S> UnwindSafe for AhoCorasick<S> where
    S: UnwindSafe

Blanket Implementations

impl<T> Any for T where
    T: 'static + ?Sized
[src]

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

impl<T> From<T> for T[src]

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

impl<T> ToOwned for T where
    T: Clone
[src]

type Owned = T

The resulting type after obtaining ownership.

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.