Learning English with Python Scraping
If you have ever tried to learn some foreign language, you know learning words is undoubtedly one of the biggest challenges. I have used online web dictionary services with my native launguage, but they are weak in keeping your search history. My favorite service gives me only 10 histories, with which I will lose most of the history already when I come back later to actually memorize them.
I’ve always wanted a way to keep more search history. So I wrote a pair of python scripts that does html scraping and made it available on bash command line.
So my ideal usage would be like:
# To get translations
$ en_search 'vote out'
vote out ['投票で除席する']
# To see the history
$ en_history
わめく: call out、cry、outcry、cry out、shout、exclaim、yell、scream、hollo、cry
english learner:
learner: 初学者、初心者
vote out: 投票で除席する
So here’s what I wrote. Note that this is completely for personal use, so no error check or anything like that.
# en_search.py
from lxml import html
import requests
import sys
import urllib.parse
import os.path
URL = 'http://blahblah.blahblah/service-path/'
def scrape_words(query):
page = requests.get(URL + urllib.parse.quote_plus(query))
tree = html.fromstring(page.content)
return tree.xpath('//td[@class="content-explanation"]/text()')
def save_history(org_query, words):
file_dir = os.path.dirname(os.path.realpath(__file__))
f = open(file_dir + '/search_history.txt', 'a')
f.write(org_query + ': ' + ', '.join(words) + '\n')
f.close()
def main():
org_query = sys.argv[1]
words = scrape_words(org_query)
save_history(org_query, words)
print(org_query, words)
if __name__ == "__main__":
main()
# en_history.py
import os.path
def main():
file_dir = os.path.dirname(os.path.realpath(__file__))
f = open(file_dir + '/search_history.txt', 'r');
for line in f:
print(line, end='')
f.close()
if __name__ == "__main__":
main()
And lastly, add the new scripts to the PATH. (Don’t forget to add +x, too.)
$ ln -s path/to/en_search.py ~/bin/en_search
$ ln -s path/to/en_history.py ~/bin/en_history
Remember you have all the responsibilities if you want to try scraping yourself.