Learning English with Python Scraping

If you have ever tried to learn some foreign language, you know learning words is undoubtedly one of the biggest challenges. I have used online web dictionary services with my native launguage, but they are weak in keeping your search history. My favorite service gives me only 10 histories, with which I will lose most of the history already when I come back later to actually memorize them.

I’ve always wanted a way to keep more search history. So I wrote a pair of python scripts that does html scraping and made it available on bash command line.

So my ideal usage would be like:

# To get translations
$ en_search 'vote out'
vote out ['投票で除席する']

# To see the history
$ en_history
わめく: call out、cry、outcry、cry out、shout、exclaim、yell、scream、hollo、cry
english learner:
learner: 初学者、初心者
vote out: 投票で除席する

So here’s what I wrote. Note that this is completely for personal use, so no error check or anything like that.

# en_search.py
from lxml import html
import requests
import sys
import urllib.parse
import os.path

URL = 'http://blahblah.blahblah/service-path/'

def scrape_words(query):
    page = requests.get(URL + urllib.parse.quote_plus(query))
    tree = html.fromstring(page.content)
    return tree.xpath('//td[@class="content-explanation"]/text()')

def save_history(org_query, words):
    file_dir = os.path.dirname(os.path.realpath(__file__))
    f = open(file_dir + '/search_history.txt', 'a')
    f.write(org_query + ': ' +  ', '.join(words) + '\n')
    f.close()

def main():
    org_query = sys.argv[1]
    words = scrape_words(org_query)
    save_history(org_query, words)
    print(org_query, words)

if __name__ == "__main__":
    main()

# en_history.py
import os.path

def main():
    file_dir = os.path.dirname(os.path.realpath(__file__))
    f = open(file_dir + '/search_history.txt', 'r');
    for line in f:
        print(line, end='')
    f.close()

if __name__ == "__main__":
    main()

And lastly, add the new scripts to the PATH. (Don’t forget to add +x, too.)

$ ln -s path/to/en_search.py ~/bin/en_search
$ ln -s path/to/en_history.py ~/bin/en_history

Remember you have all the responsibilities if you want to try scraping yourself.