This article covers how to set up content entities with a focus on creating your own entities.
Creating content entity
- Go to Settings > Content Entities
- Click Add Preset or + New Entity
- Add Preset - The common ones we create for you: email, credit_card, iban, and personal_identity_code of different countries.
- + New Entity - Add your own set of rules to create a custom content entity
- Finally, activate the content entity you just added by clicking the three dots in the top right corner and reordering the entities.
Creating custom content entities
After clicking + New Entity, enter the fields on the screen:
- Name: This may contain only alphanumerical characters, ‘_’, and ‘-’. No space is allowed.
- Description: Give it a short description with an example
- Sanitize: Enable this by checking the box if you wish our model to replace the detected custom entity with a placeholder
- In the Regular Expression field, define the regular expression of the content entity
- Enable Case Sensitive if the content entity is case sensitive
- Enable Whole Words if the expression represents the whole word
✏️ Regular Expressions, also known as RegEx, is a special text string for describing a search pattern. When creating your own content entity user RegEx, keep in mind that it should represent all the possible patterns, format, and length of the information you want to identify.
- List Implementation: Select what matches your needs
- Partial words: when the expression is a subset of another word or could be within another word. e.g. this would recognize "hat" in "hats", "play"in "played", "playing"
- Whole words: when the expression represents the whole word. e.g. hats, beanies, caps, berets
- Lemmatize: when the content entity has multiple ways of expressing the content entity. e.g. "am", "are", "is", "was", "were", "been" as the other forms of the verb "to be"
- Expression List : Add a list of expressions that should fall under your content entity. Each line represents one expression.
- Enable Case Sensitive if the content entity should be case sensitive.
If you need support setting up content entities, please contact us at firstname.lastname@example.org