What's new in Django community blogs?

django-debug-toolbar-force

Apr 13 2017 [Archived Version] □ Published at Latest Django packages added


Building a Python Code Review Scheduler: Review Follow-Up

Apr 13 2017 [Archived Version] □ Published at tuts+

In the third part of this series, you saw how to save the code review request information for follow-up. You created a method called read_email to fetch the emails from the inbox to check if a reviewer has responded to the code review request. You also implemented error handling in the code review scheduler code.

In this part of the series, you'll use the saved code review information and the information from the emails to check if the reviewer has responded to the review request. If a request has not been responded to, you'll send a follow-up email to the reviewer.

Getting Started

Start by cloning the source code from the third part of the tutorial series.

Modify the config.json file to include some relevant email addresses, keeping the [email protected] email address. It's because git has commits related to this particular email address which are required for the code to execute as expected. Modify the SMTP credentials in the schedule.py file:

Navigate to the project directory CodeReviewer and try to execute the following command in the terminal.

It should send the code review request to random developers for review and create a reviewer.json file with review information.

Implementing a Follow-Up Request

Let's start by creating a follow-up request method called followup_request. Inside the followup_request method, read the reviewer.json file and keep the content in a list. Here is how the code looks:

Next, pull in the email information using the read_email method that you implemented in the last tutorial.

If the reviewer has responded to the review request, there should be an email with the same subject matter and a Re: tag prefixed to it. So iterate through the review information list and compare the review subject with the email subject to see if the reviewer has responded to the request.

As seen in the above code, you iterated through the review_info list and checked the review information subject against the email subject to see if the reviewer has responded.

Now, once the reviewer has responded to the code review request, you don't need to keep the particular review information in the reviewer.json file. So create a Python method called Delete_Info to remove the particular review information from the reviewer.json file. Here is how Delete_Info looks:

As seen in the above code, you have iterated through the review information list and deleted the entry which matches the Id. After removing the information from the file, return the list.

You need to call the Delete_Info method when a particular piece of review information is replied to. When calling the Delete_Info method, you need to pass a copy of the review_info so that the original info list isn't altered. You'll need the original review information list for comparison later. So import the copy Python module to create a copy of the original review information list.

Create a copy of the review_info list.

When deleting the review information that has been responded to from the original list, pass the copy list to the Delete_Info method.

Here is the followup_request method:

Now, once the review_info list has been iterated, you need to check if there are any changes in the reviewer.json file. If any existing review information has been removed, you need to update the reviewer.json file appropriately. So check if review_info_copy and review_info are the same, and update the reviewer.json file. 

Here is the complete followup_request method:

Make a call to the followup_request method to follow up on the review requests that have already been sent.

Save the above changes. To test the follow-up functionality, delete the reviewer.json file from the project directory. Now run the scheduler so that code review requests are sent to random developers. Check if that information has been saved in the reviewer.json file.

Ask the particular developer to respond to the code review request by replying to the email. Now run the scheduler again, and this time the scheduler program should be able to find the response and remove it from the reviewer.json file.

Sending Reminder Emails

Once the reviewer has responded to the code review request emails, that information needs to be removed from the reviewer.json file since you don't need to track it further. If the reviewer has not yet responded to the code review request, you need to send a follow-up mail to remind him or her about the review request.

The code review scheduler would run on a daily basis. When it's run, you first need to check if it's been a certain time since the developer has responded to the review request. In the project configuration, you can set a review period during which, if the reviewer has not responded, the scheduler would send a reminder email.

Let' start by adding a configuration in the project config. Add a new config called followup_frequency in the config file.

So, when the reviewer has not responded for followup_frequency number of days, you'll send a reminder email. Read the configuration into a global variable while reading the configurations:

Inside the followup_request method, send a reminder email when the reviewer has not replied to the follow-up requests for followup_frequency number of days. Calculate the number of days since the review was sent.

If the number of days is greater than the follow-up frequency date in the configurations, send the reminder email.

Here is the complete followup_request method:

Wrapping It Up

In this tutorial, you saw how to implement the logic to follow up on code review requests. You also added the functionality to send a reminder email if the reviewer hasn't responded to the email for a certain number of days. 

This Python code reviewer can be further enhanced to suit your needs. Do fork the repository and add new features, and let us know in the comments below.

Source code from this tutorial is available on GitHub


Building a Python Code Review Scheduler: Keeping the Review Information

Apr 12 2017 [Archived Version] □ Published at tuts+

In the second part of this series, you saw how to collect the commit information from the git logs and send review requests to random developers selected from the project members list.

In this part, you'll see how to save the code review information to follow up each time the code scheduler is run. You'll also see how to read emails to check if the reviewer has responded to the review request.

Getting Started

Start by cloning the source code from the second part of the tutorial series.

Modify the config.json file to include some relevant email addresses, keeping the [email protected] email address. It's because the git has commits related to the particular email address which is required for the code to execute as expected. Modify the SMTP credentials in the schedule.py file:

Navigate to the project directory CodeReviewer and try to execute the following command in the terminal.

It should send the code review request to random developers for review.

Keeping the Review Request Information

To follow up on the review request information, you need to keep it somewhere for reference. You can select where you want to keep the code review request information. It can be any database or may be a file. For the sake of this tutorial, we'll keep the review request information inside a reviewer.json file. Each time the scheduler is run, it'll check the info file to follow up on the review requests that haven't been responded to.

Create a method called save_review_info which will save the review request information inside a file. Inside the save_review_info method, create an info object with the reviewer, subject, and a unique Id.

For a unique Id, import the uuid Python module.

You also need the datetime Python module to get the current date. Import the datetime Python module.

You need to initialize the reviewer.json file when the program starts if it doesn't already exist.

If the file doesn't exist, you need to create a file called reviewer.json and fill it with an empty JSON array as seen in the above code.

This method will be called each time a review request is sent. So, inside the save_review_info method, open the reviewer.json file in read mode and read the contents. Append the new content information into the existing content and write it back to the reviewer.json file. Here is how the code would look:

Inside the schedule_review_request method, before sending the code review request mail, call the save_review_info method to save the review information.

Save the above changes and execute the scheduler program. Once the scheduler has been run, you should be able to view the reviewer.json file inside the project directory with the code review request information. Here is how it would look:

Reading the Email Data

You have collected all the code review request information and saved it in the reviewer.json file. Now, each time the scheduler is run, you need to check your mail inbox to see if the reviewer has responded to the code review request. So first you need to define a method to read your Gmail inbox.

Create a method called read_email which takes the number of days to check the inbox as a parameter. You'll make use of the imaplib Python module to read the email inbox. Import the imaplib Python module:

To read the email using the imaplib module, you first need to create the server.

Log in to the server using the email address and password:

Once logged in, select the inbox to read the emails:

You'll be reading the emails for the past n number of days since the code review request was sent. Import the timedelta Python module. 

Create the email date as shown:

Using the formatted_date, search the email server for emails.

It will return the unique IDs for each email, and using the unique IDs you can get the email details.

Now you'll make use of the first_email_id and the last_email_id to iterate through the emails and fetch the subject and the "from" address of the emails.

data will contain the email content, so iterate the data part and check for a tuple. You'll be making use of the email Python module to extract the details. So import the email Python module. 

You can extract the email subject and the "from" address as shown:

Here is the complete read_email method:

Save the above changes and try running the above read_email method:

It should print the email subject and "from" address on the terminal. 

Email Reading From Gmail

Now let's collect the "from" address and subject into an email_info list and return the data. 

Instead of printing the subject and the "from" address, append the data to the email_info list and return the email_info list.

Here is the modified read_email method:

Adding Logging for Error Handling

Error handling is an important aspect of software development. It's really useful during the debugging phase to trace bugs. If you have no error handling, then it gets really difficult to track the error. Since you're growing with a couple of new methods, I think it's the right time to add error handling to the scheduler code.

To get started with error handling, you'll be needing the logging Python module and the RotatingFileHandler class. Import them as shown:

Once you have the required imports, initialize the logger as shown:

In the above code, you initialized the logger and set the log level to INFO. 

Create a rotating file log handler which will create a new file each time the log file has reached a maximum size.

Attach the logHandler to the logger object.

Let's add the error logger to log errors when an exception is caught. In the read_email method's exception part, add the following code:

The first line logs the error message with the current date and time to the log file. The second line logs the stack trace to the error. 

Similarly, you can add the error handling to the main part of the code. Here is how the code with error handling would look:

Wrapping It Up

In this part of the series, you shelved the review request information in the reviewer.json file. You also created a method to read the emails. You'll be using both of these functions to follow up on the code review requests in the final part of this series.

Additionally, don’t hesitate to see what we have available for sale and for study in the marketplace, and don't hesitate to ask any questions and provide your valuable feedback using the feed below.

Source code from this tutorial is available on GitHub.

Do let us know your thoughts and suggestions in the comments below.


List Comprehensions in Python

Apr 12 2017 [Archived Version] □ Published at tuts+

List comprehensions provide you a way of writing for loops more concisely. They can be useful when you want to create new lists from existing lists or iterables. For example, you can use list comprehensions to create a list of squares from a list of numbers. Similarly, you could also use some conditions on a list of numbers so that the new list you create is a subset of the original list. 

Keep in mind that you cannot write every for loop as a list comprehension. One more thing: the name "list comprehensions" can be a bit confusing because it seems to suggest that the comprehensions are only meant for working with lists. In reality, the word "list" in list comprehensions is used to indicate that you can loop over any iterable in Python and the end product would be a list. 

Loops and List Comprehensions

Basic list comprehensions that don't use any conditions have the following form:

Let's begin by writing a very basic for loop to list the first 15 multiples of 5. First, you need to create an empty list. Then, you have to iterate over a range of numbers and multiply them by 5. The new sequence of numbers that you get will consist of multiples of 5. 

The above for loop basically has the following structure: 

If you compare it with the list comprehension form that you read earlier, you can see that <the_element> is n, <the_iterable> is range(1,16), and <the_expression> is n*5. Putting these values in the list comprehension will give us the following result:

Similarly, you can also get a list with the cube of given numbers like this:

Conditions in List Comprehensions

You can also use an if condition to filter out certain values from the final list. In this case, the list comprehension takes the following form:

A basic example of this type of comprehension would be to get all the even numbers in a given range. A for loop to do this task will look like this:

The same thing could also be accomplished by using the following list comprehension:

A more complex example of using list comprehensions would be adding .. if .. else .. conditional expressions inside them. 

In this case, the order in which you lay out the statements inside the list comprehension will be different from usual if conditions. When you only have an if condition, the condition goes to the end of the comprehension. However, in the case of an .. if .. else .. expression, the positions of the for loop and the conditional expression are interchanged. The new order is:

Let's begin by writing the verbose .. if .. else .. conditional expression to get squares of even numbers and cubes of odd numbers in a given range.

The above conditional expression has the following structure:

Putting the corresponding values in the right places will give you the following list comprehension:

List Comprehensions for Nested Loops

You can also use nested loops within a list comprehension. There is no limit on the number of for loops that you can put inside a list comprehension. However, you have to keep in mind that the order of the loops should be the same in both the original code and the list comprehension. You can also add an optional if condition after each for loop.  A list comprehension with nested for loops will have the following structure:

The following examples should make everything clearer. There are two nested loops, and multiplying them together gives us multiplication tables.

These nested for loops can be rewritten as:

Once you have written the loop in this form, converting it to a list comprehension is easy:

You can also use a similarly written list comprehension to flatten a list of lists. The outer for loop iterates through individual lists and stores them in the variable row. The inner for loop will then iterate through all the elements in the current row. During the first iteration, the variable row has the value [1, 2, 3, 4]. The second loop iterates through this list or row and appends all those values to the final list.

Nested List Comprehensions

Nested list comprehensions may sound similar to list comprehensions with nested loops, but they are very different. In the first case, you were dealing with loops within loops. In this case, you will be dealing with list comprehensions within list comprehensions. A good example of using nested list comprehensions would be creating a transpose of the matrix for the previous section.

Without a list comprehension expression, you will need to use two for loops to create the transpose.

The outer loop iterates through the matrix four times because there are four columns in it. The inner loop iterates through the elements inside the current row one index at a time and appends it to a temporary list called temp. The temp list is then appended as a row to the transposed matrix. In the case of nested list comprehensions, the outermost loop comes at the end and the innermost loop comes at the beginning. 

Here is the above code in the form of a list comprehension:

Another way to look at this is by considering the structure of list comprehensions that replace the basic for loops that you learned about at the beginning of the article.

If you compare it with the nested list comprehension above, you will see that <the_expression> in this case is actually another list comprehension: [row[n] for row in matrix]. This nested list comprehension itself is in the form of a basic for loop.

Final Thoughts

I hope this tutorial helped you understand what list comprehensions are and how to use them in place of basic for loops to write concise and slightly faster code while creating lists. 

Another thing that you should keep in mind is the readability of your code. Creating list comprehensions for nested loops will probably make the code less readable. In such cases, you can break down the list comprehension into multiple lines to improve readability.

Additionally, don’t hesitate to see what we have available for sale and for study on Envato Market, and don't hesitate to ask any questions and provide your valuable feedback using the feed below.


django-jsrender

Apr 12 2017 [Archived Version] □ Published at Latest Django packages added

Render Django templates into Javascript functions.


Emotion emojione for django app

Apr 12 2017 [Archived Version] □ Published at Latest Django packages added


Python 3

Apr 11 2017 [Archived Version] □ Published at Seek Nuance under tags  django python social networking uncategorized

Helping my company migrate everything to Python 3. Righteous! And we’ll update everything else, including Django, when we do it. If only an updated Python Essential Reference was available… I’d buy it for every developer. I can’t hassle David anymore since I left Twitter. ‘Tis a shame. I could send him email and text messages, …

Read More


Building a Python Code Review Scheduler: Sending Review Requests

Apr 11 2017 [Archived Version] □ Published at tuts+

In the first part of the tutorial series, you saw how to set up the project and its required configurations. You processed the project git logs and printed them in the terminal. In this part, we'll take it to the next level and send out the code review requests.

Getting Started

Start by cloning the source code from the first part of the tutorial series.

Once you have cloned the repository, navigate to the project directory CodeReviewer and try to execute the following command in the terminal.

It should print the commit IDs, commit date and the commit author in the terminal.

Collecting All Commits With Details

You'll get the commit details while iterating the commit logs. Now you need to collect the commit details and store them in a list, so that you can iterate them later to send out the code review request. In order to collect the commit details, start by creating a Commit class with the required members as shown:

While iterating the commit logs in the process_commits method, create a Commit instance to keep the commit detail.

In the process_commits method, define a few variables as shown:

You'll be collecting each commit detail into a Python list called commits. While reading the commit logs, the first time when the commit ID is encountered, keep the commit Id and flush the date and author variables since it's a new commit. Modify the process_commits method's code after the commit keyword checking as shown: 

When the commit Id is not null, that's when the commit details have been collected and it's time to add the commit to the commits list. Add the following line of code to the above code:

Modify the Author keyword check and the Date keyword check to keep the respective commit details in the author and date variables.

Now, if there is only one commit in the source code, the details will be saved inside the commit list. So add the following code to the end of the loop to handle that scenario.

Here is the complete process_commits method which collects the commit details and returns a list of commits.

Scheduling a Code Review Request

You have the commit details collected from the project log. You need to select random developers to send the code review request. Inside the config.json file, let's add the developers associated with the project who can review the code. Here is the modified config.json file:

Let's read the developer's info related to a particular project. Define a public variable called project_members.

While reading the project configurations, fill in the project member details in the project_members list variable.

Now you have the developer list related to a particular project in the project_members variable.

Define a method called schedule_review_request which you'll call to schedule the review request corresponding to each project commit. The review request will be sent to a random developer from the project_members list, excluding the commit author. 

Create a method called select_reviewer to select the random developer from the project_members list. To select random developers from the list, you'll be making use of the random Python module. Import the random Python module.

Here is how the code would look:

As seen in the above code, the commit author has been removed from the developer list before selecting random developers to review the code. To select random developers from the list, you have made use of the random.choice method from the random module.

Inside the schedule_review_request method, iterate through each commit from the commits list. For each commit, select a random developer other than the author of the commit to send out the review request. Here is how the code would look:

Format the Code Review Request

You selected random developers to send out the code review request. Before sending the review request, you need to format it with details about the review request. Define a method called format_review_commit which will format the code review request. Here is how the code would look:

In the schedule_review_request method, build up the review request email content which will be sent to the reviewer. The email content will contain the required information for the reviewer to review the code commit. Modify the schedule_review_request as shown:

Save the above changes and run the Python scheduler program.

You should be able to see an output similar to the one shown below:

Code Review Scheduler Output

Emailing the Code Review Request

Create a method called send_email which will email the review request with the required subject and content. You'll be making use of the smtplib module to send out the emails. Import smptlib in the scheduler.py file:

Define the mail server details along with the public variables:

Create a method called send_email which will send out the email to the address specified. Here is how the send_email code would look:

As seen in the above code, you created the smtp server using the gmail server and port number. Using the defined username and password, you logged into the email account and sent the email to the recipient.

Modify the schedule_review_request method to send the email instead of printing the email content to the terminal.

Save the above changes. Modify the config.json file to include a valid email address which you can check. Run the scheduler using the following command:

You should be able to see the following output on the terminal:

Code Review Request Sending Output

Verify the email address to see the code review request mailed from the Code Review scheduler.

Wrapping It Up

In this part of the Python Code Review Scheduler series, you collected the commit information into a list. The commit list was further iterated to format the review request. Random developers were selected to send out the code review request.

In the next part of this series, you'll see how to follow up the code review request.

Source code from this tutorial is available on GitHub.

I hope you enjoyed this part. Do let us know your thoughts in the comments below.


django-channels-jsonrpc

Apr 11 2017 [Archived Version] □ Published at Latest Django packages added


New Coffee Break Course: Taming Python With Unit Tests

Apr 10 2017 [Archived Version] □ Published at tuts+

Final product image
What You'll Be Creating

Hey, developers! Never code without a safety net! That's what unit tests are for.

In our latest Coffee Break Course, Taming Python With Unit Tests, Envato Tuts+ instructor Derek Jensen will show you how to use the built-in Python unit testing framework. You'll be able to jump in and begin writing unit tests in a matter of minutes.

Unit testing in Python

What's a Coffee Break Course? It's an ultra-short video course designed to teach a skill or concept in a single sitting. This one is less than ten minutes long, and by the end of it you’ll understand how to write unit tests in Python.

Watch the introduction below to find out more.

 

You can take our new Coffee Break Course straight away with a free 10-day trial of our monthly subscription. If you decide to continue, it costs just $15 a month, and you’ll get access to hundreds of courses, with new ones added every week.

And also check out the thousands of useful scripts and plugins we have available on CodeCanyon.


Building a Python Code Review Scheduler: Processing Log

Apr 10 2017 [Archived Version] □ Published at tuts+

In this tutorial series, you'll see how to build a code review scheduler using Python. Throughout the course of this series, you'll brush up against some basic concepts like reading emails, sending an email, executing terminal commands from Python program, processing git logs, etc.

In the first part, you'll start by setting up the basic configuration files, reading git logs, and processing them for sending the code review request. 

Getting Started

Start by creating a project folder called CodeReviewer. Inside the CodeReviewer folder, create a file called scheduler.py

Assuming the code review scheduler will be run against multiple projects, you'll need to specify the project name against which the scheduler will run and the number of days for which the log needs to processed. So first read these two parameters as arguments from the code review program. 

Let's make use of the argparse Python module for reading the program parameters. Import the library and add the program arguments. You can use the ArgumentParser method of the argparse module to initiate the parser. Once it's initiated, you can add the arguments to the parser. Here is the code for reading the arguments from the program:

Setting Up Project Configurations

Let's maintain a separate config file that will be processed by the code reviewer. Create a file called config.json inside the project directory CodeReviewer. Inside the config file, there will be information about each project that will be processed. Here is how the project config file would look:

A few more options would be added to the project configurations in the later parts. 

Let's read the configuration JSON file into the Python program. Import the JSON module and load the JSON data read from the config file.

Read Commit Info From the Repository

When the reviewer script is run, the project name is specified as a parameter. Based on the project name specified, check if its configurations are available and clone the repository. 

First, you need to find the project URL from the configurations. Iterate the project's data and find the project URL as shown:

Once you have the project URL, check if the project is already cloned. If not, clone the project URL. If it already exists, navigate to the existing project directory and pull the latest changes.

To execute system commands, you'll be making use of the Python os module. Create a method to execute system commands since you'll be using it frequently. Here is the execute_cmd method:

Processing the Git Log

After fetching the commit log from the Git repository, you'll analyze the log. Create a new Python method called process_commits to process the Git logs.

Git provides us with the commands to get the commit log. To get all logs from a repository, the command would be:

The response would be:

You can also get logs specific to the number of days from the time the command is executed. To get logs since n number of days, the command would be:

You can narrow it down further to see whether a particular commit was an addition, modification, or deletion. Execute the above command with --name-status:

The above command would have the following output:

The A letter on the left side of the README.md file indicates addition. Similarly, M would indicate modification and D would indicate deletion.

Inside the process_commits method, let's define the Git command to be executed to get the log history. 

Pass the above command cmd to the execute_cmd method.

Read the response, iterate each line, and print the same.

Make a call to the process_commits method after the configurations have been read.

Save the above changes and try to execute the code reviewer using the following command:

As you can see, we have started the code reviewer with the number of days and the project name to process. You should be able to see the following output:

So when you execute the code reviewer, you can see that the repository is created if it doesn't already exist, or else it is updated. After that, based on the number of days provided, it fetches the commit log history to process. 

Now let's analyze the commit log to find out the commit Id, commit date, and commit author.

As seen in the logs, the commit id starts with the keyword commit , author starts with the keyword Author:, and date starts with the keyword Date:. You'll be using the following keywords to identify the commit Id, author and date for a commit.

Let's try to get the commit Id from the Git log lines. This is quite straightforward. You only need to check if the line starts with the keyword commit.

Save the changes and execute the scheduler and you should be able to get the commit Id.

The next task is to extract the author name. To check if the line contains the author info, you'll first check if the line starts with the Author keyword. If it does, you'll make use of a regular expression to get the user. 

As you can see, the user email address is inside the "less than greater than" signs. We'll use a regular expression to read the email address between < >. The regular expression will be like this:

Import the Python re module to use regular expressions in Python.

Now check if the line starts with the Author keyword. If it does, extract the user email address using the regular expression above. Here is how it would look:

To extract the commit date from the log, you need to check if the line starts with the Date keyword. Here is how it would look:

Here is the final process_commits method:

Save the above changes and start the code reviewer.

You should have each commit detail with the commit Id, Author and commit date printed on the terminal.

Wrapping It Up

In this first part of the Python Code Review Scheduler, you saw how to set up the project. You read the input parameters required by the scheduler to process the project. In the next part of this tutorial series, we'll collect the commit details from the process_commits method and send the commit to random developers for code review.

Don’t hesitate to see what we have available for sale and for study on Envato Market, and don't hesitate to ask any questions and provide your valuable feedback using the feed below.

I hope you enjoyed the first part. Do let us know your thoughts or any suggestions in the comments below.

Source code from this tutorial is available on GitHub.


A decent Elasticsearch search engine implementation

Apr 09 2017 [Archived Version] □ Published at Peterbe.com

The title is a bit of an understatement because I think it's pretty good. It's not perfect and it's not guaranteed to scale, but it works pretty well. Especially on search term typos.

This, my own blog, now has a search engine built with Elasticsearch using the Python library elasticsearch-dsl. The algorithm (if you can call it that) is my own afternoon hack invention. Before I explain how it works try out a couple of searches:

Try a couple of searches:

(each search appends &debug-search for extended output)

  • corn - finds Cornwall, cron, Crontabber, crontab, corp etc.
  • crown - finds crown, Crowne, crowded, crowds, crowd etc.
  • react - finds create-react-app, React app, etc.
  • jugg - finds Jung, juggling, judging, judged etc.
  • pythn - finds Python, python2.4, python2.5 etc.

Also, by default it uses Elasticsearch's match_phrase so when you search for a multi-word thing, it requires a match on each term. E.g. date format which finds Date formatting, date formats etc.

But if you search for something where the whole phrase can't match, it splits up the search an uses a match operator instead (minus any stop words).

Typo-focussed

This solution is very much focussed on typos. One thing I really dislike in non-Google search engines is when you make a search and nothing is found and it says "Did you mean ...?". Quite likely I did, but why do I have to click it? Can't it just be clicked for me?

Also, if there's ambiguity and possibly some results based on what you typed and multiple potential "Did you mean...?". Why not just blend them alltogether like Google does? Here is my attempt to solve that. Come with me...

Figuring Out ALL Search Terms

So if you type "Firefix" (not "Firefox", also scroll to the bottom to see the debug table) then maybe, that's an actual word that might be in the database. Then by using the Elasticsearch's Suggesters it figures out alternative spellings based on frequency distributions within the indexed content. This lookup is actually really fast. So now it figures out three alternative ways to spell this term:

  • firefox (score 0.9, 1 character different)
  • firefli (score 0.7, 2 characters different)
  • firfox (score 0.7, 2 characters different)

And, very arbitrarily I pick a score for the default term that the user typed in. Let's pick 1.1. Doesn't matter gravely and it's up for future tuning. The initial goal is to not bury this spelling alternative too far back.

Here's how to run the suggester for every defined doc type and generate a list of other search terms tuples (minimum score >=0.6).

search_terms = [(1.1, q)]
_search_terms = set([q])
doc_type_keys = (
    (BlogItemDoc, ('title', 'text')),
    (BlogCommentDoc, ('comment',)),
)
for doc_type, keys in doc_type_keys:
    suggester = doc_type.search()
    for key in keys:
        suggester = suggester.suggest('sugg', q, term={'field': key})
    suggestions = suggester.execute_suggest()
    for each in suggestions.sugg:
        if each.options:
            for option in each.options:
                if option.score >= 0.6:
                    better = q.replace(each['text'], option['text'])
                    if better not in _search_terms:
                        search_terms.append((
                            option['score'],
                            better,
                        ))
                        _search_terms.add(better)

Eventually we get a list (once sorted) that looks like this:

search_terms = [(1.1 'firefix'), (0.9, 'firefox'), (0.7, 'firefli'), (0.7, 'firfox')]

The only reason the code sorts this by the score is in case there are crazy-many search terms. Then we might want to chop off some and only use the 5 highest scoring spelling alternatives.

Building The Boosted OR-query

In this scenario, we're searching amongst blog posts. The title is likely to be a better match than the body. If the title mentions it we probably want to favor that over those where it's only mentioned in the body.

So to build up the OR-query we'll boost the title more than the body ("text" in this example) and we'll build it up using all possible search terms and boost them based on their score. Here's the complete query.

strategy = 'match_phrase'
if original_q:
    strategy = 'match'
search_term_boosts = {}
for i, (score, word) in enumerate(search_terms):
    # meaning the first search_term should be boosted most
    j = len(search_terms) - i
    boost = 1 * j * score
    boost_title = 2 * boost
    search_term_boosts[word] = (boost_title, boost)
    match = Q(strategy, title={
        'query': word,
        'boost': boost_title,
    }) | Q(strategy, text={
        'query': word,
        'boost': boost,
    })
    if matcher is None:
        matcher = match
    else:
        matcher |= match

search_query = search_query.query(matcher)

The core is that it does Q('match_phrase' title='firefix', boost=2X) | Q('match_phrase', text='firefix', boost=X).

Here's another arbitrary number. The number 2. It means that the "title" is 2 times more important than the "text".

And that's it! Now every match is scored based on how suggester's score and whether it be matched on the "title" or the "text" (or both). Elasticsearch takes care of everything else. The default is to sort by the _score as ultimately dictated by Lucene.

Match Phrase or Match

In this implementation it tries to match using a match phrase query which basically tries to find matches where every word in the query matches.

The cheap solution here is to basically keep whole search function as is, but if absolutely nothing is found with a match_phrase, and there were multiple words, then just recurse over one more time and do it with a match query instead.

This could probably be improved and do the match_phrase first with higher boost and do the match too but with a lower boost. All in one big query.

Want A Copy?

Note, this copy is quite a mess! It's a personal side-project which is an excuse for experimentation and goofing around.

The full search function is here.

Please don't judge me for the scrappiness of the code but please share your thoughts on this being a decent application of Elasticsearch for smallish datasets like a blog.


django-cerberus-ac

Apr 08 2017 [Archived Version] □ Published at Latest Django packages added

Django access control app, using OBAC and separation of privileges.


django-cerberus-ac

Apr 08 2017 [Archived Version] □ Published at Latest Django packages added

Django access control app, using OBAC and separation of privileges.


django-appsettings

Apr 08 2017 [Archived Version] □ Published at Latest Django packages added

Application settings helper for Django apps.


django-planet aggregates posts from Django-related blogs. It is not affiliated with or endorsed by the Django Project.

Social Sharing

Feeds

Tag cloud

admin administration adsense advanced ajax amazon angular angularjs apache api app appengine app engine apple application security aprendiendo python architecture articles asides audrey authentication automation backup bash basics best practices binary bitbucket blog blog action day blogging book books buildout business c++ cache capoeira celery celerycam celerycrawler challenges chat cheatsheet cherokee choices christianity class-based-views cliff cloud cms code codeship codeship news coding command community computer computers computing configuration consumernotebook continuous deployment continuous integration couchdb coverage css custom d data database databases db debian debugging deploy deployment deployment academy design developers development devops digitalocean django django1.7 django admin django cms djangocon django framework django-nose django-readonly-site django-rest-framework django-tagging django templates django-twisted-chat django web framework tutorials documentation dojango dojo dotcloud dreamhost dughh easy_install eclipse education elasticsearch email encoding english error europe eventbrite events expressjs extensions fabric facebook family fashiolista fedora field file filter fix flash flask foreman form forms frameworks friends fun functional reactive programming gae gallery games geek general gentoo gis git github gmail gnome google google app engine guides gunicorn hack hackathon hacking hamburg haskell heroku holidays hosting howto how-to howtos how-tos html http i18n image imaging indifex install installation intermediate internet ios iphone java javascript jinja2 jobs journalism jquery json justmigrated kde la latex linear regression linkedin linode linux login mac machine learning mac os x markdown math memcached meme mercurial meta meteor migration mirror misc model models mod_wsgi mongodb months mozilla multi-language mvc mysql nelenschuurmans newforms news nginx nodejs nosql oauth ogólne openshift opensource open source open-source openstack operations orm osx os x ottawa paas packages packaging patterns pedantics pelican penetration test performance personal personal and misc philippines philosophy php pi pil pinax pip piston planet plone plugin pony postgis postgres postgresql ppoftw presentation private programmieren programming programming &amp; internet project projects pycharm pycon pycon-2013-guide pydiversity pygrunn pyladies pypi pypy pyramid python python3 queryset quick tips quora rabbitmq rails rant ratnadeep debnath reactjs recipe redis refactor release request resolutions rest reusable app review rhel rtnpro ruby ruby on rails scala scaling science screencast script scripting security server setup shell simple smiley snaking software software collections software development south sphinx sprint sql ssh ssl static storage supervisor support svn sysadmin tag tag cloud talk nerdy to me tastypie tdd techblog technical technology template templates template tags test testing tests tip tips tools tornado transifex travel travel tips for geeks tumbles tutorial tutorials twisted twitter twoscoops typo3 ubuntu uncategorized unicode unittest unix use user authentication usergroup uwsgi uxebu virtualenv virtualenvwrapper web web 2.0 web application web applications web design &amp; development webdev web development webfaction web framework websockets whoosh windows wordpress work workshop wsgi yada znc zope