xml地图|网站地图|网站标签 [设为首页] [加入收藏]

澳门皇家娱乐嵌套映射和过滤器及查询,几篇关

来源:http://www.ccidsi.com 作者:呼叫中心培训课程 人气:75 发布时间:2019-08-28
摘要:ElasticSearch - 嵌套映射和过滤器 Because nestedobjects areindexed as separate hidden documents, we can’t query themdirectly. Instead,we have to usethe  nested  query toaccess them: GET /my_index/blogpost/_search{ "query": { "

ElasticSearch - 嵌套映射和过滤器

Because nested objects are indexed as separate hidden documents, we can’t query them directly. Instead, we have to use the nested query to access them:

GET /my_index/blogpost/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "eggs" }}, 
        {
          "nested": {
            "path": "comments", 
            "query": {
              "bool": {
                "must": [ 
                  { "match": { "comments.name": "john" }},
                  { "match": { "comments.age":  28     }}
                ]
        }}}}
      ]
}}}


①The title clause operates on the root document.
②The nested clause “steps down” into the nested comments field. It no longer has access to fields in the root document, nor fields in any other nested document.
③ The comments.name and comments.age clauses operate on the same nested  document

nested field can contain other nested fields. Similarly, a nested query can contain othernested queries. The nesting hierarchy is applied as you would expect.

Of course, a nested query could match several nested documents. Each matching nested document would have its own relevance score, but these multiple scores need to be reduced to a single score that can be applied to the root document.

By default, it averages the scores of the matching nested documents. This can be controlled by setting thescore_mode parameter to avgmaxsum, or even none (in which case the root document gets a constant score of 1.0).

GET /my_index/blogpost/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "eggs" }},
        {
          "nested": {
            "path":       "comments",
            "score_mode": "max", 
            "query": {
              "bool": {
                "must": [
                  { "match": { "comments.name": "john" }},
                  { "match": { "comments.age":  28     }}
                ]
        }}}}
      ]
}}}

①Give the root document the _score from the best-matching nested document.

If placed inside the filter clause of a Boolean query, a nested query behaves much like anested query, except that it doesn’t accept the score_mode parameter. Because it is being used as a non-scoring query — it includes or excludes, but doesn’t score —  a score_modedoesn’t make sense since there is nothing to score.

 

curl -XPOST "http://localhost:9200/index-1/movie/" -d'
{
   "title": "The Matrix",
   "cast": [
      {
         "firstName": "Keanu",
         "lastName": "Reeves"
      },
      {
         "firstName": "Laurence",
         "lastName": "Fishburne"
      }
   ]
}'

Given many such movies in our index we can find all movies with an actor named "Keanu" using a search request such as:

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "term": {
               "cast.firstName": "keanu"
            }
         }
      }
   }
}'

Running the above query indeed returns The Matrix. The same is true if we try to find movies that have an actor with the first name "Keanu" and last name "Reeves":

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "cast.firstName": "keanu"
                     }
                  },
                  {
                     "term": {
                        "cast.lastName": "reeves"
                     }
                  }
               ]
            }
         }
      }
   }
}'

Or at least so it seems. However, let's see what happens if we search for movies with an actor with "Keanu" as first name and "Fishburne" as last name.

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "cast.firstName": "keanu"
                     }
                  },
                  {
                     "term": {
                        "cast.lastName": "fishburne"
                     }
                  }
               ]
            }
         }
      }
   }
}'

Clearly this should, at first glance, not match The Matrix as there's no such actor amongst its cast. However, ElasticSearch will return The Matrix for the above query. After all, the movie does contain an author with "Keanu" as first name and (albeit a different) actor with "Fishburne" as last name. Based on the above query it has no way of knowing that we want the two term filters to match the same unique object in the list of actors. And even if it did, the way the data is indexed it wouldn't be able to handle that requirement.

ElasticSearch - 嵌套映射和过滤器

Because nested objects are indexed as separate hidden documents, we can’t query them directly. Instead, we have to use the nested query to access them:

GET /my_index/blogpost/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "eggs" }}, 
        {
          "nested": {
            "path": "comments", 
            "query": {
              "bool": {
                "must": [ 
                  { "match": { "comments.name": "john" }},
                  { "match": { "comments.age":  28     }}
                ]
        }}}}
      ]
}}}


①The title clause operates on the root document.
②The nested clause “steps down” into the nested comments field. It no longer has access to fields in the root document, nor fields in any other nested document.
③ The comments.name and comments.age clauses operate on the same nested  document

nested field can contain other nested fields. Similarly, a nested query can contain othernested queries. The nesting hierarchy is applied as you would expect.

Of course, a nested query could match several nested documents. Each matching nested document would have its own relevance score, but these multiple scores need to be reduced to a single score that can be applied to the root document.

By default, it averages the scores of the matching nested documents. This can be controlled by setting thescore_mode parameter to avgmaxsum, or even none (in which case the root document gets a constant score of 1.0).

GET /my_index/blogpost/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "eggs" }},
        {
          "nested": {
            "path":       "comments",
            "score_mode": "max", 
            "query": {
              "bool": {
                "must": [
                  { "match": { "comments.name": "john" }},
                  { "match": { "comments.age":  28     }}
                ]
        }}}}
      ]
}}}

①Give the root document the _score from the best-matching nested document.

If placed inside the filter clause of a Boolean query, a nested query behaves much like anested query, except that it doesn’t accept the score_mode parameter. Because it is being used as a non-scoring query — it includes or excludes, but doesn’t score —  a score_modedoesn’t make sense since there is nothing to score.

 

curl -XPOST "http://localhost:9200/index-1/movie/" -d'
{
   "title": "The Matrix",
   "cast": [
      {
         "firstName": "Keanu",
         "lastName": "Reeves"
      },
      {
         "firstName": "Laurence",
         "lastName": "Fishburne"
      }
   ]
}'

Given many such movies in our index we can find all movies with an actor named "Keanu" using a search request such as:

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "term": {
               "cast.firstName": "keanu"
            }
         }
      }
   }
}'

Running the above query indeed returns The Matrix. The same is true if we try to find movies that have an actor with the first name "Keanu" and last name "Reeves":

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "cast.firstName": "keanu"
                     }
                  },
                  {
                     "term": {
                        "cast.lastName": "reeves"
                     }
                  }
               ]
            }
         }
      }
   }
}'

Or at least so it seems. However, let's see what happens if we search for movies with an actor with "Keanu" as first name and "Fishburne" as last name.

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "cast.firstName": "keanu"
                     }
                  },
                  {
                     "term": {
                        "cast.lastName": "fishburne"
                     }
                  }
               ]
            }
         }
      }
   }
}'

Clearly this should, at first glance, not match The Matrix as there's no such actor amongst its cast. However, ElasticSearch will return The Matrix for the above query. After all, the movie does contain an author with "Keanu" as first name and (albeit a different) actor with "Fishburne" as last name. Based on the above query it has no way of knowing that we want the two term filters to match the same unique object in the list of actors. And even if it did, the way the data is indexed it wouldn't be able to handle that requirement.

By Mike Hillyer

Nested mapping and filter to the rescue

Luckily ElasticSearch provides a way for us to be able to filter on multiple fields within the same objects in arrays; mapping such fields as nested. To try this out, let's create ourselves a new index with the "actors" field mapped as nested.

curl -XPUT "http://localhost:9200/index-2" -d'
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested"
            }
         }
      }
   }
}'

After indexing the same movie document into the new index we can now find movies based on multiple properties of each actor by using a nested filter. Here's how we would search for movies starring an actor named "Keanu Fishburne":

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "bool": {
                     "must": [
                        {
                           "term": {
                              "firstName": "keanu"
                           }
                        },
                        {
                           "term": {
                              "lastName": "fishburne"
                           }
                        }
                     ]
                  }
               }
            }
         }
      }
   }
}'

As you can see we've wrapped our initial bool filter in a nested filter. The nested filter contains a path property where we specify that the filter applies to the cast property of the searched document. It also contains a filter (or a query) which will be applied to each value within the nested property.

As intended, running the abobe query doesn't return The Matrix while modifying it to instead match "Reeves" as last name will make it match The Matrix. However, there's one caveat.

Nested mapping and filter to the rescue

Luckily ElasticSearch provides a way for us to be able to filter on multiple fields within the same objects in arrays; mapping such fields as nested. To try this out, let's create ourselves a new index with the "actors" field mapped as nested.

curl -XPUT "http://localhost:9200/index-2" -d'
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested"
            }
         }
      }
   }
}'

After indexing the same movie document into the new index we can now find movies based on multiple properties of each actor by using a nested filter. Here's how we would search for movies starring an actor named "Keanu Fishburne":

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "bool": {
                     "must": [
                        {
                           "term": {
                              "firstName": "keanu"
                           }
                        },
                        {
                           "term": {
                              "lastName": "fishburne"
                           }
                        }
                     ]
                  }
               }
            }
         }
      }
   }
}'

As you can see we've wrapped our initial bool filter in a nested filter. The nested filter contains a path property where we specify that the filter applies to the cast property of the searched document. It also contains a filter (or a query) which will be applied to each value within the nested property.

As intended, running the abobe query doesn't return The Matrix while modifying it to instead match "Reeves" as last name will make it match The Matrix. However, there's one caveat.

Introduction

Most users at one time or another have dealt with hierarchical data in a SQL database and no doubt learned that the management of hierarchical data is not what a relational database is intended for. The tables of a relational database are not hierarchical (like XML), but are simply a flat list. Hierarchical data has a parent-child relationship that is not naturally represented in a relational database table.

For our purposes, hierarchical data is a collection of data where each item has a single parent and zero or more children (with the exception of the root item, which has no parent). Hierarchical data can be found in a variety of database applications, including forum and mailing list threads, business organization charts, content management categories, and product categories. For our purposes we will use the following product category hierarchy from an fictional electronics store:

澳门皇家娱乐 1

These categories form a hierarchy in much the same way as the other examples cited above. In this article we will examine two models for dealing with hierarchical data in MySQL, starting with the traditional adjacency list model.

Including nested values in parent documents

If we go back to our very first query, filtering only on actors first names without using a nested filter, like the request below, we won't get any hits.

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "term": {
               "cast.firstName": "keanu"
            }
         }
      }
   }
}'

This happens because movie documents no longer have cast.firstName fields. Instead each element in the cast array is, internally in ElasticSearch, indexed as a separate document.

Obviously we can still search for movies based only on first names amongst the cast, by using nested filters though. Like this:

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "term": {
                     "firstName": "keanu"
                  }
               }
            }
         }
      }
   }
}'

The above request returns The Matrix. However, sometimes having to use nested filters or queries when all we want to do is filter on a single property is a bit tedious. To be able to utilize the power of nested filters for complex criterias while still being able to filter on values in arrays the same way as if we hadn't mapped such properties as nested we can modify our mappings so that the nested values will also be included in the parent document. This is done using theinclude_in_parent property, like this:

curl -XPUT "http://localhost:9200/index-3" -d'
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested",
               "include_in_parent": true
            }
         }
      }
   }
}'

In an index such as the one created with the above request we'll both be able to filter on combinations of values within the same complex objects in the actors array using nested filters while still being able to filter on single fields without using nested filters. However, we now need to carefully consider where to use, and where to not use, nested filters in our queries as a query for "Keanu Fishburne" will match The Matrix using a regular bool filter while it won't when wrapping it in a nested filter. In other words, when using include_in_parent we may get unexpected results due to queries matching documents that it shouldn't if we forget to use nested filters.

PS. For updates about new posts, sites I find useful and the occasional rant you can follow me on Twitter. You are also most welcome to subscribe to the RSS-feed.

Including nested values in parent documents

If we go back to our very first query, filtering only on actors first names without using a nested filter, like the request below, we won't get any hits.

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "term": {
               "cast.firstName": "keanu"
            }
         }
      }
   }
}'

This happens because movie documents no longer have cast.firstName fields. Instead each element in the cast array is, internally in ElasticSearch, indexed as a separate document.

Obviously we can still search for movies based only on first names amongst the cast, by using nested filters though. Like this:

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "term": {
                     "firstName": "keanu"
                  }
               }
            }
         }
      }
   }
}'

The above request returns The Matrix. However, sometimes having to use nested filters or queries when all we want to do is filter on a single property is a bit tedious. To be able to utilize the power of nested filters for complex criterias while still being able to filter on values in arrays the same way as if we hadn't mapped such properties as nested we can modify our mappings so that the nested values will also be included in the parent document. This is done using theinclude_in_parent property, like this:

curl -XPUT "http://localhost:9200/index-3" -d'
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested",
               "include_in_parent": true
            }
         }
      }
   }
}'

In an index such as the one created with the above request we'll both be able to filter on combinations of values within the same complex objects in the actors array using nested filters while still being able to filter on single fields without using nested filters. However, we now need to carefully consider where to use, and where to not use, nested filters in our queries as a query for "Keanu Fishburne" will match The Matrix using a regular bool filter while it won't when wrapping it in a nested filter. In other words, when using include_in_parent we may get unexpected results due to queries matching documents that it shouldn't if we forget to use nested filters.

PS. For updates about new posts, sites I find useful and the occasional rant you can follow me on Twitter. You are also most welcome to subscribe to the RSS-feed.

The Adjacency List Model

Typically the example categories shown above will be stored in a table like the following (I'm including full CREATE and INSERT statements so you can follow along):

CREATE TABLE category(
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT DEFAULT NULL);


INSERT INTO category
VALUES(1,'ELECTRONICS',NULL),(2,'TELEVISIONS',1),(3,'TUBE',2),
(4,'LCD',2),(5,'PLASMA',2),(6,'PORTABLE ELECTRONICS',1),
(7,'MP3 PLAYERS',6),(8,'FLASH',7),
(9,'CD PLAYERS',6),(10,'2 WAY RADIOS',6);

SELECT * FROM category ORDER BY category_id;

 ------------- ---------------------- -------- 
| category_id | name         | parent |
 ------------- ---------------------- -------- 
|      1 | ELECTRONICS     |  NULL |
|      2 | TELEVISIONS     |   1 |
|      3 | TUBE         |   2 |
|      4 | LCD         |   2 |
|      5 | PLASMA        |   2 |
|      6 | PORTABLE ELECTRONICS |   1 |
|      7 | MP3 PLAYERS     |   6 |
|      8 | FLASH        |   7 |
|      9 | CD PLAYERS      |   6 |
|     10 | 2 WAY RADIOS     |   6 |
 ------------- ---------------------- -------- 
10 rows in set (0.00 sec)

In the adjacency list model, each item in the table contains a pointer to its parent. The topmost element, in this case electronics, has a NULL value for its parent. The adjacency list model has the advantage of being quite simple, it is easy to see that FLASH is a child of mp3 players, which is a child of portable electronics, which is a child of electronics. While the adjacency list model can be dealt with fairly easily in client-side code, working with the model can be more problematic in pure SQL.

Array Type

Read the doc on elasticsearch.org

As its name suggests, it can be an array of native types (string, int, …) but also an array of objects (the basis used for “objects” and “nested”).

Here are some valid indexing examples :

{
    "Article" : [
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [
            {
                "firstname" : "Francois",
                "surname": "francoisg",
                "id" : 18
            },
            {
                "firstname" : "Gregory",
                "surname" : "gregquat"
                "id" : "2"
            }
        ]
      }
    },
    {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [
            {
                "firstname" : "Gregory",
                "surname" : "gregquat",
                "id" : "2"
            }
        ]
      }
}

 

You can find different Array :

  • Categories : array of integers
  • Tags : array of strings
  • author : array of objects (inner objects or nested)

We explicitely specify this “simple” type as it can be more easy/maintainable to store a flatten value rather than the complete object.
Using a non relational structure should make you think about a specific model for your search engine :

  • To filter : If you just want to filter/search/aggregate on the textual value of an object, then flatten the value in the parent object.
  • To get the list of objects that are linked to a parent (and if you do not need to filter or index these objects), just store the list of ids and hydrate them with Doctrine and Symfony (in French for the moment).

Array Type

Read the doc on elasticsearch.org

As its name suggests, it can be an array of native types (string, int, …) but also an array of objects (the basis used for “objects” and “nested”).

Here are some valid indexing examples :

{
    "Article" : [
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [
            {
                "firstname" : "Francois",
                "surname": "francoisg",
                "id" : 18
            },
            {
                "firstname" : "Gregory",
                "surname" : "gregquat"
                "id" : "2"
            }
        ]
      }
    },
    {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [
            {
                "firstname" : "Gregory",
                "surname" : "gregquat",
                "id" : "2"
            }
        ]
      }
}

 

You can find different Array :

  • Categories : array of integers
  • Tags : array of strings
  • author : array of objects (inner objects or nested)

We explicitely specify this “simple” type as it can be more easy/maintainable to store a flatten value rather than the complete object.
Using a non relational structure should make you think about a specific model for your search engine :

  • To filter : If you just want to filter/search/aggregate on the textual value of an object, then flatten the value in the parent object.
  • To get the list of objects that are linked to a parent (and if you do not need to filter or index these objects), just store the list of ids and hydrate them with Doctrine and Symfony (in French for the moment).

Retrieving a Full Tree

The first common task when dealing with hierarchical data is the display of the entire tree, usually with some form of indentation. The most common way of doing this is in pure SQL is through the use of a self-join:

SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS';

 ------------- ---------------------- -------------- ------- 
| lev1    | lev2         | lev3     | lev4 |
 ------------- ---------------------- -------------- ------- 
| ELECTRONICS | TELEVISIONS     | TUBE     | NULL |
| ELECTRONICS | TELEVISIONS     | LCD     | NULL |
| ELECTRONICS | TELEVISIONS     | PLASMA    | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS  | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL |
 ------------- ---------------------- -------------- ------- 
6 rows in set (0.00 sec)

Inner objects

The inner objects are just the JSON object association in a parent. For example, the “authors” in the above example. The mapping for this example could be :

 

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author : 
                            type : object
                            properties : 
                                firstname : ~
                                surname : ~
                                id : 
                                    type : integer

 

You can Filter or Query on these “inner objects”. For example :

query: author.firstname=Francois will return the post with the id 12 (and not the one with the id 13).

You can read more on the Elasticsearch website

Inner objects are easy to configure. As Elasticsearch documents are “schema less”, you can index them without specify any mapping.

The limitation of this method lies in the manner as ElasticSearch stores your data. Reusing the above example, here is the internal representation of our objects :

 

[
      {
        "id" : 12
        "title" : An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author.firstname" : ["Francois","Gregory"],
        "author.surname" : ["Francoisg","gregquat"],
        "author.id" : [18,2]
      }
      {
        "id" : 13
        "title" : "A second article",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author.firstname" : ["Gregory"],
        "author.surname" : ["gregquat"],
        "author.id" : [2]
      }
]

 

The consequence is that the query :

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "firstname": "francois",
          "surname": "gregquat"
        }
      }
    }
  }
}

author.firstname=Francois AND surname=gregquat will return the document “12″. In the case of an inner object, this query can by translated as “Who has at least one author.surname = gregquat and one author.firstname=francois”.

 

To fix this problem, you must use the nested.

Inner objects

The inner objects are just the JSON object association in a parent. For example, the “authors” in the above example. The mapping for this example could be :

 

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author : 
                            type : object
                            properties : 
                                firstname : ~
                                surname : ~
                                id : 
                                    type : integer

 

You can Filter or Query on these “inner objects”. For example :

query: author.firstname=Francois will return the post with the id 12 (and not the one with the id 13).

You can read more on the Elasticsearch website

Inner objects are easy to configure. As Elasticsearch documents are “schema less”, you can index them without specify any mapping.

The limitation of this method lies in the manner as ElasticSearch stores your data. Reusing the above example, here is the internal representation of our objects :

 

[
      {
        "id" : 12
        "title" : An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author.firstname" : ["Francois","Gregory"],
        "author.surname" : ["Francoisg","gregquat"],
        "author.id" : [18,2]
      }
      {
        "id" : 13
        "title" : "A second article",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author.firstname" : ["Gregory"],
        "author.surname" : ["gregquat"],
        "author.id" : [2]
      }
]

 

The consequence is that the query :

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "firstname": "francois",
          "surname": "gregquat"
        }
      }
    }
  }
}

author.firstname=Francois AND surname=gregquat will return the document “12″. In the case of an inner object, this query can by translated as “Who has at least one author.surname = gregquat and one author.firstname=francois”.

 

To fix this problem, you must use the nested.

Finding all the Leaf Nodes

We can find all the leaf nodes in our tree (those with no children) by using a LEFT JOIN query:

SELECT t1.name FROM
category AS t1 LEFT JOIN category as t2
ON t1.category_id = t2.parent
WHERE t2.category_id IS NULL;


 -------------- 
| name     |
 -------------- 
| TUBE     |
| LCD     |
| PLASMA    |
| FLASH    |
| CD PLAYERS  |
| 2 WAY RADIOS |
 -------------- 

Les nested

First important difference : nested must be specified in your mapping.

The mapping looks like an object one, only the type changes :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author : 
                            type : nested
                            properties : 
                                firstname : ~
                                surname : ~
                                id : 
                                    type : integer

 

This time, the internal representation will be :

 

[
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [{
            "firstname" : "Francois",
            "surname" : "Francoisg",
            "id" : 18
        },
        {
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      },
      {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tags" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [{
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      }
]

 

This time, we keep the object structure.

Nested have their own filters which allows to filter by nested object. If we go on with our example (with the limitation of inner objects), we can write this query :

 

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "nested" : {
          "path" : "author",
          "filter": {
            "bool": {
              "must": [
                {
                  "term" : {
                    "author.firsname": "francois"
                  }
                },
                {
                  "term" : {
                    "author.surname": "gregquat"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

hi
We can translate it as “Who has an author object whose surname is equal to ‘gregquat’ and whose firstname is ‘francois’”. This query will return no result.

 

There is still a problem which is penalizing when working with bug objects : when you want to change a single value of the nester, you have to reindex the whole parent document (including the nested).
If the objects are heavy, and often updated, the impact on performances can be important.

To fix this problem, you can use the parent/child associations.

Les nested

First important difference : nested must be specified in your mapping.

The mapping looks like an object one, only the type changes :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author : 
                            type : nested
                            properties : 
                                firstname : ~
                                surname : ~
                                id : 
                                    type : integer

 

This time, the internal representation will be :

 

[
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [{
            "firstname" : "Francois",
            "surname" : "Francoisg",
            "id" : 18
        },
        {
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      },
      {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tags" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [{
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      }
]

 

This time, we keep the object structure.

Nested have their own filters which allows to filter by nested object. If we go on with our example (with the limitation of inner objects), we can write this query :

 

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "nested" : {
          "path" : "author",
          "filter": {
            "bool": {
              "must": [
                {
                  "term" : {
                    "author.firsname": "francois"
                  }
                },
                {
                  "term" : {
                    "author.surname": "gregquat"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

hi
We can translate it as “Who has an author object whose surname is equal to ‘gregquat’ and whose firstname is ‘francois’”. This query will return no result.

 

There is still a problem which is penalizing when working with bug objects : when you want to change a single value of the nester, you have to reindex the whole parent document (including the nested).
If the objects are heavy, and often updated, the impact on performances can be important.

To fix this problem, you can use the parent/child associations.

Retrieving a Single Path

The self-join also allows us to see the full path through our hierarchies:

SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS' AND t4.name = 'FLASH';

 ------------- ---------------------- ------------- ------- 
| lev1    | lev2         | lev3    | lev4 |
 ------------- ---------------------- ------------- ------- 
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
 ------------- ---------------------- ------------- ------- 
1 row in set (0.01 sec)

The main limitation of such an approach is that you need one self-join for every level in the hierarchy, and performance will naturally degrade with each level added as the joining grows in complexity.

Parent/Child

Parent/child associations are very similar to OneToMany relationships (one parent, several children).
The relationship remains hierarchical : an object type is only associated to one parent, and it’s impossible to create a ManyToMany relationship.

We are going to link our article to a category :

 

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                category : 
                    mappings : 
                        id : ~
                        name : ~
                        description : ~
                article :
                    mappings:
                        title : ~
                        tag : ~
                        author : ~
                    _routing:
                        required: true
                        path: category
                    _parent:
                        type : "category"
                        identifier: "id" #optional as id is the default value
                        property : "category" #optional as the default value is the type value

 

When indexing an article, a reference to the Category will also be indexed (category.id).
So, we can index separately categories and article while keeping the references between them.

Like for nested, there are Filters and Queries that allow to search on parents or children :

  • Has Parent Filter / Has Parent Query : Filter/query on parent fields, returns children objects. In our case, we could filter articles whose parent category contains “symfony” in his description.
  • Has Child Filter / Has Child Query : Filter/query on child fields, returns the parent object. In our case, we could filter Categories for which “francoisg” has written an article.

 

{
  "query": {
    "has_child": {
      "type": "article",
      "query" : {
        "filtered": {
          "query": { "match_all": {}},
          "filter" : {
              "term": {"tag": "symfony"}
          }
        }
      }
    }
  }
}

This query will return the Categories that have at least one article tagged with “symfony”.

 

The queries are here written in JSON, but are easily transformable into PHP with the Elastica library.

 

Parent/Child

Parent/child associations are very similar to OneToMany relationships (one parent, several children).
The relationship remains hierarchical : an object type is only associated to one parent, and it’s impossible to create a ManyToMany relationship.

We are going to link our article to a category :

 

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                category : 
                    mappings : 
                        id : ~
                        name : ~
                        description : ~
                article :
                    mappings:
                        title : ~
                        tag : ~
                        author : ~
                    _routing:
                        required: true
                        path: category
                    _parent:
                        type : "category"
                        identifier: "id" #optional as id is the default value
                        property : "category" #optional as the default value is the type value

 

When indexing an article, a reference to the Category will also be indexed (category.id).
So, we can index separately categories and article while keeping the references between them.

Like for nested, there are Filters and Queries that allow to search on parents or children :

  • Has Parent Filter / Has Parent Query : Filter/query on parent fields, returns children objects. In our case, we could filter articles whose parent category contains “symfony” in his description.
  • Has Child Filter / Has Child Query : Filter/query on child fields, returns the parent object. In our case, we could filter Categories for which “francoisg” has written an article.

 

{
  "query": {
    "has_child": {
      "type": "article",
      "query" : {
        "filtered": {
          "query": { "match_all": {}},
          "filter" : {
              "term": {"tag": "symfony"}
          }
        }
      }
    }
  }
}

This query will return the Categories that have at least one article tagged with “symfony”.

 

The queries are here written in JSON, but are easily transformable into PHP with the Elastica library.

 

本文由68399皇家赌场发布于呼叫中心培训课程,转载请注明出处:澳门皇家娱乐嵌套映射和过滤器及查询,几篇关

关键词: 68399皇家赌场 大数据 数据库技术

最火资讯