When you open a document in the viewer, the system will calculate a percent similarity to other documents in the project, and show those documents on the "Similar" tab. The way this works is each word in the document is tokenized, and a list of all key words in the documents is created. Then it will search the rest of the database index for documents containing those keywords.
When it's finished, each document is given a score based on the number of keywords that hit. We find the max score across the set, lets say 50 as an example, and then the percentage is displayed as a fraction of that max score. So if another document had all 50 key words matched, its score would be:
(50/50) * 100 = 100%, even though the documents might not be 100% identical.
If 20 out of the 50 keywords hit, the score would be:
(20/50) * 100 = 40%
By default the UI will only show records that had a 75% or higher calculated similarity, but custom reports can be run with a lower or higher threshold. Contact Support@Indexed.IO for more info on running custom reports.