> For the complete documentation index, see [llms.txt](https://omalab.gitbook.io/engineeringwiki/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://omalab.gitbook.io/engineeringwiki/back-end/adding-a-new-service.md).

# Adding a New Service

Our system performs clustering using the features from the Profile.

## Profile Features

* name - This is the name of the individual or company
  * Transformations
    * Removal of other attributes--simplification to the first name last name in firstname lastname order.  So, "John Smith III" becomes "John Smith". &#x20;
    * Removal of .com/.net and other domain name tails. "audienti.com" becomes "audienti".&#x20;
    * Removal of common company/corporate endings. "Juvomaster LLC" becomes "Juvomaster".&#x20;
* description - This is a description that is provided by the profile.
  * The description, unlike the name, is not consistent across the profiles, as it's contextualized to the service. Simply putting it as a word won't work.&#x20;
* * Transformations
    * Stop word removal
    * Stemming of the words
    * downcasing of words
* location - the stated location of the profile
  * Some profiles will have this, some will not. &#x20;
  * We also carry versions of this in the form of the fields: country, territory (state), city, address.
  * Transformation
    * This should be transformed into a Longitude and latitude. &#x20;
    * Approximate distance between 2 points should be the criteria used for proximity. &#x20;
* references - Listings of other profiles in the profile.&#x20;
  * This is a common way we cluster. When a Twitter profile for example, mentions an instagram profile, it creates a relationship between the two.  The reference is "validated" if its bidirectional.  Its not if it's not.  It can "loop".
  * Transformation
    * Convert to a standard profile\_id
* gender - the gender of the profile
  * If a profile is for a person, then our current system uses a gender detector we have written to try to identify the gender from the name.  If this works, we mark the profile with this gender.&#x20;
  * Transformation
    * We use our gender identifier to do this. It has the most common 10k or so names in it by gender, and we score/match them up.&#x20;
* image\_url - a picture of the profile
  * If the profile is a person, in theory we could try to do facial recognition. But right now, this field is not used in any way.&#x20;
* lang - the language of the profile
  * This can be used to validate, but is not unique enough to cluster with.
* Other attributes that could be used are: follower counts, friend counts, like counts, share counts.

## Existing clustering algorithm

* In the current version of the application, we do clustering. &#x20;
* Our current clustering algorithm does a "rough" clustering by using the name as a single feature (with the modifiations above).&#x20;
* Once this is done, a second classification is done. This classification then breaks apart "people" and "company" profiles, and then performs a secondary classification on these to create a person/company. &#x20;
* Note that while I expected/believed that the algorithm used the references between profiles, this does not seem to have been in production. &#x20;
* Net: Our current is VERY simplistic.  Too simplistic to work.&#x20;


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://omalab.gitbook.io/engineeringwiki/back-end/adding-a-new-service.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
