Crate unicode_segmentation
source · [−]Expand description
Iterators which split strings on Grapheme Cluster, Word or Sentence boundaries, according to the Unicode Standard Annex #29 rules.
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let s = "a̐éö̲\r\n";
let g = UnicodeSegmentation::graphemes(s, true).collect::<Vec<&str>>();
let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
assert_eq!(g, b);
let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
let w = s.unicode_words().collect::<Vec<&str>>();
let b: &[_] = &["The", "quick", "brown", "fox", "can't", "jump", "32.3", "feet", "right"];
assert_eq!(w, b);
let s = "The quick (\"brown\") fox";
let w = s.split_word_bounds().collect::<Vec<&str>>();
let b: &[_] = &["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", " ", "fox"];
assert_eq!(w, b);
}
no_std
unicode-segmentation does not depend on libstd, so it can be used in crates
with the #![no_std]
attribute.
crates.io
You can use this package in your project by adding the following
to your Cargo.toml
:
[dependencies]
unicode-segmentation = "1.9.0"
Structs
Cursor-based segmenter for grapheme clusters.
External iterator for grapheme clusters and byte offsets.
External iterator for a string’s grapheme clusters.
External iterator for sentence boundaries and byte offsets.
External iterator for a string’s sentence boundaries.
External iterator for word boundaries and byte offsets.
External iterator for a string’s word boundaries.
An iterator over the substrings of a string which, after splitting the string on sentence boundaries, contain any characters with the Alphabetic property, or with General_Category=Number.
An iterator over the substrings of a string which, after splitting the string on word boundaries, contain any characters with the Alphabetic property, or with General_Category=Number. This iterator also provides the byte offsets for each substring.
An iterator over the substrings of a string which, after splitting the string on word boundaries, contain any characters with the Alphabetic property, or with General_Category=Number.
Enums
An error return indicating that not enough content was available in the provided chunk to satisfy the query, and that more content must be provided.
Constants
The version of Unicode that this version of unicode-segmentation is based on.
Traits
Methods for segmenting strings according to Unicode Standard Annex #29.