Reader small image

You're reading from  Deep Learning with Theano

Product typeBook
Published inJul 2017
PublisherPackt
ISBN-139781786465825
Edition1st Edition
Tools
Right arrow
Author (1)
Christopher Bourez
Christopher Bourez
author image
Christopher Bourez

Christopher Bourez graduated from Ecole Polytechnique and Ecole Normale Suprieure de Cachan in Paris in 2005 with a Master of Science in Math, Machine Learning and Computer Vision (MVA). For 7 years, he led a company in computer vision that launched Pixee, a visual recognition application for iPhone in 2007, with the major movie theater brand, the city of Paris and the major ticket broker: with a snap of a picture, the user could get information about events, products, and access to purchase. While working on missions in computer vision with Caffe, TensorFlow or Torch, he helped other developers succeed by writing on a blog on computer science. One of his blog posts, a tutorial on the Caffe deep learning technology, has become the most successful tutorial on the web after the official Caffe website. On the initiative of Packt Publishing, the same recipes that made the success of his Caffe tutorial have been ported to write this book on Theano technology. In the meantime, a wide range of problems for Deep Learning are studied to gain more practice with Theano and its application.
Read more about Christopher Bourez

Right arrow

Seq2seq for translation


Sequence-to-sequence (Seq2seq) networks have their first application in language translation.

A translation task has been designed for the conferences of the Association for Computational Linguistics (ACL), with a dataset, WMT16, composed of translations of news in different languages. The purpose of this dataset is to evaluate new translation systems or techniques. We'll use the German-English dataset.

  1. First, preprocess the data:

    python 0-preprocess_translations.py --srcfile data/src-train.txt --targetfile data/targ-train.txt --srcvalfile data/src-val.txt --targetvalfile data/targ-val.txt --outputfile data/demo
    First pass through data to get vocab...
    Number of sentences in training: 10000
    Number of sentences in valid: 2819
    Source vocab size: Original = 24995, Pruned = 24999
    Target vocab size: Original = 35816, Pruned = 35820
    (2819, 2819)
    Saved 2819 sentences (dropped 181 due to length/unk filter)
    (10000, 10000)
    Saved 10000 sentences (dropped 0 due to length/unk filter...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Deep Learning with Theano
Published in: Jul 2017Publisher: PacktISBN-13: 9781786465825

Author (1)

author image
Christopher Bourez

Christopher Bourez graduated from Ecole Polytechnique and Ecole Normale Suprieure de Cachan in Paris in 2005 with a Master of Science in Math, Machine Learning and Computer Vision (MVA). For 7 years, he led a company in computer vision that launched Pixee, a visual recognition application for iPhone in 2007, with the major movie theater brand, the city of Paris and the major ticket broker: with a snap of a picture, the user could get information about events, products, and access to purchase. While working on missions in computer vision with Caffe, TensorFlow or Torch, he helped other developers succeed by writing on a blog on computer science. One of his blog posts, a tutorial on the Caffe deep learning technology, has become the most successful tutorial on the web after the official Caffe website. On the initiative of Packt Publishing, the same recipes that made the success of his Caffe tutorial have been ported to write this book on Theano technology. In the meantime, a wide range of problems for Deep Learning are studied to gain more practice with Theano and its application.
Read more about Christopher Bourez