Matching objects in Javascript

Every programmer I know (including my past self!) goes through the following brain melting moments :melting_face: when learning Javascript.

First, we learn to set up some variables pointing to some values:

let form = "gato"
let gloss = "cat"

Then we learn that we can retrieve the value that variable labels, just by referring to it (Let’s pretend Javascript’s response is spoken by a robot…):

form

:robot: "gato"

…you get the string "gato" back. This is not amazing.

Then we learn about the == equality operator, which results in the further not-amazingness:

"gato" == "cat"

:robot: false

"gato" == "gato"

:robot: true

Incredible!

And we can do other incantations:

form == gloss // effectively the same as "gato" == "cat"

:robot: false

Or

gloss == "cat"

:robot: true

Okeydokey, we get the idea. Now we learn about the wondrous world of objects, where we can bundle up more than one key-value pair to represent something complex. So for instance, instead of futzing about with these two values to represent the notion of a “word”, we can create a single object that represents our word:

let word = {"form": "gato", "gloss": "cat" }

This seems useful. Maybe let’s make another one:

let word2 ==  {"form": "perro", "gloss": "dog" }

Well, now we should be able to use == on these two words and prove that they are not equal. Right?

word == word2

:robot: "false"

Well, sure looks that way!

Narrator: It is not that way.

Just to make sure, let’s verify that a cat is a cat…

word == word

Ha! See? It works!

Narrator: It doesn’t work.

It just has to work! I want it to work! Look, even if I don’t use variables, it will work!

({"form": "gato", "gloss": "cat" } == {"form": "gato", "gloss": "cat" })

Note: the parens are necessary in the last line because otherwise the first { will be interpreted as beginning a code block and not an object. Don’t :thinking: too much about it at the moment, it’s mostly irrelevant to the current discussion!

I mean, obviously those are the same two things! They are identical!

:robot: "false"

wat

image

A weird thing

This is indisputably a weird thing. Let’s review:

"perro" == "gato"
// false, different strings.

let dogVariable = "perro"
let catVariable = "gato"
dogVariable == catVariable
// false, variables labeling different strings. 

let dog = { "form": "perro", "gloss": "dog" }
let cat = { "form": "gato", "gloss": "cat" }

dog == cat
// false, of course not

dog == dog
// true, seems unsurprising…

dog == cat
// false, again unsurprising…

// so what if we just "skip the variables":
({ "form": "perro", "gloss": "dog" } == { "form": "perro", "gloss": "dog" })

// false! WAT?

When an object is not an object (in Javascript)

It’s worth noting that this is a Javascript-specific thing.

Python is very clever about comparing objects:

>>> { "form": "perro", "gloss": "dog" } == { "form": "perro", "gloss": "dog" }
True

Python is even clever enough to know that the order of the key/value pairs in an object shouldn’t affect object equality, because objects (well, “dicts” in Python…) are unordered. So this counts as true as well:

>>> { "gloss": "dog", "form": "perro" } == { "form": "perro", "gloss": "dog"}
True

On the other hand, making this kind of object equality the default has an efficiency cost. Also, what if you want to modify the definition of equality? For instance, suppose you have two “word” objects with the same form and gloss, but one of them happens to have a property domain with a semantic domain in it. You might very well want to ignore that domain property for the purposes of determining equality.

That is to say, are the two words in this array “equal”?

[
 { "gloss": "dog", "form": "perro" } ,
 { "gloss": "dog", "form": "perro", "domain": "animals" } 
]

Yes? Kind of? Maybe? No? It depends?

The only way to tell your programming language how to handle such things is to write some code that implements your definition of equality, whether you’re using Python, Javascript, or any other language.

In Python you override the __eq__ … er, whatever __those_python_thingies__ are called. (Ask @meaganvigus or @tillyb or @xrotwng or @sunny, your friendly post author is a Javascript guy!)

Anyway, the point is, you have to write some code to get your program to grok what you think of as being a word.

The same thing ends up being the case in Javascript, it’s just that you always have to write a function when you want to compare two objects. Conveniently, consider the next heading.

How to write a function to compare two objects in Javascript

Basically the strategy goes something like this:

  • To compare two objects a and b:
    • For every property in a (let’s call it key) you care about:
      • If b doesn’t have key, they are not the same object
      • If b does have key but a’s value for b’s value for key are different, they are not the same object.
    • Otherwise, they are the same object.

Implementation of equals(a,b)

Here’s a Javascript implementation of that:

let equals = (a,b) => {
  return Object.keys(a)
     .every(key in b && b[key] == a[key])
}

By way of explanation:

Object.keys(object)

The Object.keys method will return an array of all the keys in an object:

let word = {"form": "gato", "gloss": "cat" }
Object.keys(word)

:robot: ["form", "gloss"]

The .every() Array method

The every() method tests whether all elements in the array pass the test implemented by the provided function. It returns a Boolean [true or false] value.

The .every Array method will go through every item in an array and check to see if the function it is passed returns true for every item in the array. Here are a couple examples:

let words = [ "perro", "gato", "rato" ]
words.every(word => word.includes("a")) // false
words.every(word => word.endsWith("o")) // true

let numbers = [1,2,3,4,5]
numbers.every(number => number < 10) // true
numbers.every(number => number < 3) // false

So in our equals(a,b) function, there is an “anonymous” function takes each key of a in turn, and checks whether uses that key to check our definition of equality.

So if we run:

equals(
  { form: "gato", gloss: "cat" },  // this is a
  { form: "gato", gloss: "cat" }   // this is b
)

Then this code is run for each key:

key in b && b[key] == a[key]

The in operator is used to ask if the current key of b is also a key of b. So in our example we ask first if b has the key form.

"form" in b

Recall that b is { form: "gato", gloss: "cat" }, so yes, "form" is in there:

:robot: true

Now, that && business (the “logical AND”) means “evaluate what’s next only if what we’ve seen so far is true.”

So only now do we ask if b’s value for "form" is the same as a’s, and that works out to true as well:

b["form"] == a["form"]  // works out to "gato" == "gato"

:robot: true

Next, this happens for gloss, and running key in b && b[key] == a[key] also works out to true (since gloss is in { form: "gato", gloss: "cat" } and "cat" == "cat".

Donesies! We have implemented an equals function for words.

Except of course

That our data might not be so tidy. What about this?

equals([
 { "gloss": "dog", "form": "perro" } ,
 { "gloss": "dog", "form": "perro", "domain": "animals" } 
])

:robot: true

Here’s a table representation of the step-by-step execution of the call above — note that the words only qualify as equal if all of the values in the equals column are true:

key in a? in b? a[key] b[key] equals
"form" true true "gato" "gato" true
"gloss" true true "cat" "cat" true

Yes! Victory! These are the same word!

Except, er…

equals([
 { "gloss": "dog", "form": "perro", "domain": "animals" } ,
 { "gloss": "dog", "form": "perro" } 
])

:robot: false

key in a? in b? a[key] b[key] equals
"form" true true "gato" "gato" true
"gloss" true true "cat" "cat" true
"domain" true false "animals" false

Now we are looking at the domain key since it’s in a. And hence we get a fail. But this is weird, because we’re comparing the same two words as before.

If we really want to define our equality as meaning that both words have not just the same values as the keys of a, but rather, the exact same values and the exact same keys, then we have to say so.

One simple way to do this is to make sure that both objects have the same number of keys, and only then to check that the values of every key is equal in both objects:

let equals = (a, b) => {
	return Object.keys(a).length == Object.keys(b).length &&
  Object.keys(a)
    .every(key => key in b && b[key] == a[key])
}
equals(
{"form": "gato", "gloss": "cat", "domain": "animals"},
{"form": "gato", "gloss": "cat"}
)
key in a? in b? a[key] b[key] a keys length b keys length equals
3 2 false
"form" true true "gato" "gato" true
"gloss" true true "cat" "cat" true
"domain" true false "animals" false

Note that the visualization table below is quite distinct, since we only compare lengths the first time around. In fact, rows 2-5 are irrelevant, since we’ve already failed to have trues in the last column. (In fact, Javascript doesn’t bother to run them.)

Note that this will also detect the case where the number of keys is identical, but
one of the keys differs between a and b:

equals(
{"form": "gato", "gloss": "cat", "domain": "animals"},
{"form": "gato", "gloss": "cat", "wordClass": "noun"}
)
key in a? in b? a[key] b[key] a keys length b keys length equals
3 3 true
"form" true true "gato" "gato" true
"gloss" true true "cat" "cat" true
"domain" true false "animals" false
"wordClass" false true "noun" false

:robot: false

*(Again, Javascript doesn’t bother to run the last line.)

So, the last thing we want to enable is the case where we want to limit comparison to some of the fields. We can do this by adding an extra parameter to our equals function implementation:

let equals = (a, b, keys=null) => {
  return keys.every(key => 
    key in b && 
    key in a && 
    b[key] == a[key])
}

Now, we can limit our comparison to "form" and "gloss" if we desire:

equals(
{"form": "gato", "gloss": "cat", "domain": "animals"},
{"form": "gato", "gloss": "cat", "wordClass": "noun"},
["form", "gloss"]
)

:robot: true

The match.js module in docling.js

I have been making use of this sort of stuff a lot in docling.js. In fact we want to enable defining all these kinds of equality, and there’s more to it than what we’ve gone over here. If you’re interested, you can take a look at the JS module here:

https://docling.land/modules/match.js

And if you’re familiar with testing then you can see some the growing test library for the module here:

https://docling.land/modules/match.test.js

The implementation is as simple as I ahve been able to make it (but it still needs work):


export let match = (queries, comparand, fields=[]) => {
  if(!Array.isArray(queries)){
    let queryObject = queries
    queries = Object.entries(queryObject)
  }

  if(fields.length){
    queries = queries
      .filter(([key,value]) => fields.includes(key))
  }

  let comparandHasAllKeys =  queries
    .every(([key,value]) => comparand[key])
     
  if(!comparandHasAllKeys){ return false }

  let allValuesMatch = queries
    .every(([key,value]) => {
      if(typeof value == 'string' && value.trim().length == 0){
        return false
      } else if(typeof value == 'string'){
        return comparand[key].includes(value)
      } else if(identifyType(value) == 'number'){
        return match(value, comparand[key])
      } else if(identifyType(value) == 'object'){ // wtf
        return match(value, comparand[key])
      } else if(value instanceof RegExp){
        return value.test(comparand[key])  
      }
    })

  return allValuesMatch
}

PS. Why should you care about any of this?

Fair question. The answer, primarily, is search. If you have a lexicon with 3000 words in it, you want to be able to search that lexicon for words that match criteria. And you want to be able to do that in flexible ways.

Because match() returns true or false, you can use it inside an .filter() method to filter an array. Like this:

let lexicon = { "metadata": {"title": "A tiny lexicon…"},
"words": [
    {"form": "gato", "gloss": "kato"},
    {"form": "perro", "gloss": "hundo"},
    {"form": "pájaro", "gloss": "birdo"}
  ]
}

lexicon.words.filter(word => match({"form": "perro"}, word))

This sort of thing is the beginning of many kinds of search patterns.