Update! For El Capitan and users of newer version of OS X, you may run into issues installing Torch or Lua packages. A fix is included now.
Update number two! Zach in the comments offers a really helpful fix if you’re on Sierra.
Update three! A lot has changed since 2016, so I’ll be posting a new version of this tutorial soon. In the meantime, please see the comments for common sticking points and troubleshooting.
There have been many recent examples of neural networks making interesting content after the algorithm has been fed input data and “learned” about it. Many of these, Google’s Deep Dream being the most well-covered, use and generate images, but what about text? This tutorial will show you how to install Torch-rnn, a set of recurrent neural network tools for character-based (ie: single letter) learning and output – it’s written by Justin Johnson, who deserves a huge “thanks!” for this tool.
The details about how all this works are complex and quite technical, but in short we train our neural network character-by-character, instead of with words like a Markov chain might. It learns what letters are most likely to come after others, and the text is generated the same way. One might think this would output random character soup, but the results are startlingly coherent, even more so than more traditional Markov output.
Torch-rnn is built on Torch, a set of scientific computing tools for the programming language Lua, which lets us take advantage of the GPU, using CUDA or OpenCL to accelerate the training process. Training can take a very long time, especially with large data sets, so the GPU acceleration is a big plus.
You can read way more info on how this all works here:
http://karpathy.github.io/2015/05/21/rnn-effectiveness
STEP 1: Install Torch
First, we have to install Torch for our system. (This section via this Torch install tutorial.)
A few notes before we start:
- Installing Torch will also install Lua and luarocks (the Lua package manager) so no need to do that separately.
- If Lua already installed, you may run into some problems (I’m not sure how to fix that, sorry!)
- We’ll be doing everything in Terminal – if you’ve never used the command-line, it would be good to learn a bit more about how that works before attempting this install.
- If you’re running a newer OS such as El Capitan, you may run into problems installing Torch, or installing packages afterwards. If that’s the case, you can follow these instructions.
In Terminal, go to your user’s home directory* and run the following commands one at a time:
1 2 3 4 |
git clone https://github.com/torch/distro.git ~/torch --recursive cd ~/torch bash install-deps ./install.sh |
This downloads the Torch repository and installs it with Lua and some core packages that are required. This may take a few minutes.
We need to add Torch to the PATH variable so it can be found by our system. Easily open your .bash_profile file (which is normally hidden) in a text editor using this command:
1 |
touch ~/.bash_profile; open ~/.bash_profile |
And add these two lines at very bottom:
1 2 |
# TORCH export PATH=$PATH:/Users/<your user name>/torch/install/bin |
…replacing your username in the path. Save and close, then restart Terminal. When done, test it with the command:
1 |
th |
Which should give you the Torch prompt. Use Control-c twice to exit, or type os.exit().
* You can install Torch anywhere you like, but you’ll have to update all the paths in this tutorial to your install location.
STEP 2: Install CUDA Support
Note: this step is only possible if your computer has an NVIDIA graphics card!
We can leverage the GPU of our computer to make the training process much faster. This step is optional, but suggested.
Download the CUDA tools with the network install – this is way faster, since it’s a 300kb download instead of 1GB: https://developer.nvidia.com/cuda-downloads.
Run installer; when done, we have to update PATH variable in the .bash_profile file like we did in the last step. Open the file and add these three lines (you may need to change CUDA-<version number> depending on which you install – Kevin points out that CUDA 8 may cause errors):
1 2 3 |
# CUDA export PATH=/Developer/NVIDIA/CUDA-7.5/bin:$PATH export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-7.5/lib:$DYLD_LIBRARY_PATH |
You may also need to modify your System Preferences under Energy Saver:
- Uncheck Automatic Graphics Switch.
- Set Computer Sleep to “Never”.
Restart Terminal and test the install by running this command:
1 |
kextstat | grep -i cuda |
You should get something like:
1 |
286 0 0xffffff7f8356e000 0x2000 0x2000 com.nvidia.CUDA (1.1.0) 5AFE550D-6361-3897-912D-897C13FF6983 <4 1> |
There are further tests in the NVIDIA docs, if you want to try them, but they’re not necessary for our purposes. If you want to go deeper into this process, you can follow these instructions from NVIDIA.
STEP 3: Install HDF5 Library for Lua
Torch-rnn comes with a preprocessor script, written in Python, that prepares our text for training. It will save our sample into an
h5 and
json file, but requires the HDF5 library to be installed.
First, install HDF5 using Homebrew:
1 2 |
brew tap homebrew/science brew install hdf5 |
(If you have issues with the install or in the next step, Joshua suggests adding the the flag --with-mpi to the Homebrew command above, which may help. If that doesn’t work, Charles has a suggested fix. If you get an error that says Unsupported HDF5 version: 1.10.0 , you can try Tom’s suggestion.)
Move to the Torch folder inside your user home directory (ie: /Users/<your user name>/torch/). The following commands download the Torch-specific HDF5 implementation and installs them:
1 2 3 |
git clone git@github.com:deepmind/torch-hdf5.git cd torch-hdf5 luarocks make hdf5-0-0.rockspec |
If you haven’t used git or Github before, as Luke points out in the comments, you might get an SSH key error. You can get a key, or just download the repository manually from here.
STEP 4: Install HDF5 Library for Python
We also need to install HDF5 support for Python. You can do this using Pip:
1 |
sudo pip install h5py |
You may get a bunch of warnings, but that’s ok. Test that it works by importing the library:
1 2 |
python import h5py |
If it imports without error, you’re good!
STEP 5: Install Torch-rnn
Now that we’ve prepared our computer with all the required libraries, it’s time to finally install Torch-rnn!
- Download the ZIP file from the project’s GitHub repository.
- Unzip it and rename to torch-rnn.
- Move the Torch-rnn folder to your Torch install folder inside your user home directory (ie: /Users/<your user name>/torch/torch-rnn )
- (You can also do this by cloning the repo, but if you know how to do that, you probably don’t need the instructions in this step 😄)
STEP 6: Prepare Your Data
We’re ready to prepare some data! Torch-rnn comes with a sample input file (all the writings of Shakespeare) that you can use to test everything. Of course, you can also use your own data; just combine everything into a single text file.
In the Terminal, go to your Torch-rnn folder and run the preprocessor script:
1 |
python scripts/preprocess.py --input_txt data/tiny-shakespeare.txt --output_h5 data/tiny_shakespeare.h5 --output_json data/tiny_shakespeare.json |
You should get a response that looks something like this:
1 2 3 4 5 6 |
Total vocabulary size: 65 Total tokens in file: 1115394 Training size: 892316 Val size: 111539 Test size: 111539 Using dtype <type 'numpy.uint8'> |
This will save two files to the data directory (though you can save them anywhere): an h5 and json file that we’ll use to train our system.
STEP 7: Train
The next step will take at least an hour, perhaps considerably longer, depending on your computer and your data set. But if you’re ready, let’s train our network! In the Torch-rnn folder and run the training script (changing the arguments if you’ve used a different data source or saved them elsewhere):
1 |
th train.lua -input_h5 data/tiny_shakespeare.h5 -input_json data/tiny_shakespeare.json |
The train.lua script uses CUDA by default, so if you don’t have that installed or available, you’ll need to disable it and run CPU-only using the flag -gpu -1. Lots more training and output options are available here.
It should spit out something like:
1 2 3 4 5 |
Running with CUDA on GPU 0 Epoch 1.00 / 50, i = 1 / 17800, loss = 4.163219 Epoch 1.01 / 50, i = 2 / 17800, loss = 4.078401 Epoch 1.01 / 50, i = 3 / 17800, loss = 3.937344 ... |
Your computer will get really hot and it will take a long time – the default is 50 epochs. You can see how long it took by adding time in front of the training command:
1 |
time th train.lua -input_h5 data/tiny_shakespeare.h5 -input_json data/tiny_shakespeare.json |
If you have a really small corpus (under 2MB of text) you may want to try adding the following flags:
1 |
-batch_size 1 -seq_length 50 |
Setting -batch_size somewhere between 1-10 should give better results with the output.
STEP 8: Generate Some Output
Getting output from our neural network is considerably easier than the previous steps (whew!). Just run the following command:
1 |
th sample.lua -checkpoint cv/checkpoint_10000.t7 -length 2000 |
A few notes:
- The -checkpoint argument is to a t7 checkpoint file created during training. You should use the one with the largest number, since that will be the latest one created. Note: running training on another data set will overwrite this file!
- The -length argument is the number of characters to output.
- This command also runs with CUDA by default, and can be disabled the same way as the training command.
- Results are printed to the console, though it would be easy to pipe it to a file instead:
1th sample.lua -checkpoint cv/checkpoint_10000.t7 -length 2000 > my_new_shakespeare.txt - Lots of other options here.
STEP 8A: “Temperature”
Changing the temperature flag will make the most difference in your network’s output. It changes the novelty and noise is the system, creating dramatically different output. The
-temperature argument expects a number between 0 and 1.
Higher temperature
Gives a better chance of interesting/novel output, but more noise (ie: more likely to have nonsense, misspelled words, etc). For example,
-temperature 0.9 results in some weird (though still surprisingly Shakespeare-like) output:
“Now, to accursed on the illow me paory; And had weal be on anorembs on the galless under.”
Lower temperature
Less noise, but less novel results. Using
-temperature 0.2 gives clear English, but includes a lot of repeated words:
“So have my soul the sentence and the sentence/To be the stander the sentence to my death.”
In other words, everything is a trade-off and experimentation is likely called for with all the settings.
All Done!
That’s it! If you make something cool with this tutorial, please tweet it to me @jeffthompson_.
These directions were really handy, thanks! I just noticed a few things that have changed.
First, I found that I had to install the HDF5 package before I installed the Python library.
Second, when running “pip install h5py” I got a lot of warnings about functions being unused, particularly a few that seemed fatal:
ld: warning: ignoring file /usr/local/lib/libhdf5.dylib, file was built for x86_64 which is not the architecture being linked (i386): /usr/local/lib/libhdf5.dylib
ld: warning: ignoring file /usr/local/lib/libhdf5_hl.dylib, file was built for x86_64 which is not the architecture being linked (i386): /usr/local/lib/libhdf5_hl.dylib
ld: warning: directory not found for option ‘-L/opt/local/lib’
But it turned out that I was still able to import the library without issues.
$ python
>>> import h5py
>>>
Third, step 5 is different now. Instead of downloading a ZIP file, you can simply clone the repo:
git clone https://github.com/jcjohnson/torch-rnn.git
Then you should be able to run the preprocess and train steps.
Thanks for the suggestions! I’ve updated the tutorial to install HDF5 first and to warn about the warnings. For downloading Torch-rnn, you can either download the ZIP or clone it. I think git cloning is more confusing if you’ve never done it before, so I stick with a more traditional click-and-download here.
Hey, Jeff
I just asked you how to import tweet as start text for torch-rnn in twitter.
I just found a project “neuralsnap” by Ross Goodwin in github, he used python wrapper to alternate the start text for char-rnn. I have no experience with Python before, now I am trying to find out if this would work. What do you think?
here is “neuralsnap”
https://github.com/rossgoodwin/neuralsnap
Not sure where you’re running into trouble. You can train
char-rnn
on any text, or use an initial text (a tweet, for example) to “seed” the output. If you want to get a tweet automatically from Twitter, that will take some programming work – I use https://github.com/bear/python-twitter for my bots, which might be a good place to start.Sorry if it is too stupid, it’s my first time working with terminal and Python.
I already trained torch-rnn, now I am trying to call it in python script so that it can get the twitter as input and triggered by Arduino signal, using Python is the best solution based on my research.
I use subprocess.Popen to call all commands, it seems like the model (.t7) can not be processed.
Not stupid at all :) The pre-processor for
torch-rnn
is a Python script, but the actual output script is in the Torch langauge. If you’re getting errors getting output, I’d post a question to thetorch-rnn
Github page.Heyyyy Jeff
turns out it is the problem with Macpython, I run the script in the terminal and after some trial & error now it is working xD
anyway thanks a lot for your patience!!
these are really great instructions. just two things tripped me up.
1) you need to have an SSH key setup for github. otherwise when you try to clone the hd5 library, it has a security error. this sounds daunting but git provide really good code you can just copy and paste into terminal to set one up in a couple minutes and associate it with your github profile.
2) if you don’t have a GPU, then the training on step 7 will fail and throw you some weird error messages. you have to include the flag ‘gpu -1’ at the end of the training statement to just train in ‘CPU mode’.
Hope that helps someone.
Glad it helped! The Github SSH key thing is good to point out for users that haven’t used it before – I’ve updated the tutorial. The non-CUDA instructions are in the tutorial, but easy to miss with so many details!
First command in STEP 7, seems you’ve got some _ in the filenames that should be -.
Thanks bud!
Dustin – the files are named in step 6 (they can be whatever you want). Mine use underscores, so if you just copy/paste it should work no problem.
Sorry for some reason I’d used the truncated command
python scripts/preprocess.py –input_txt data/tiny-shakespeare.txt
which named the output files like the input. Feel free to delete these noise comments!
I get the following error when trying to execute ‘luarocks make hdf5-0-0.rockspec’:
— Found Torch7 in /Users/letterbomb/torch/install
— HDF5: Using hdf5 compiler wrapper to determine C configuration
— HDF5: Using hdf5 compiler wrapper to determine CXX configuration
CMake Error at /usr/local/Cellar/cmake/3.6.0/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
Could NOT find HDF5 (missing: HDF5_HL_LIBRARIES) (found suitable version
“1.8.16”, minimum required is “1.8”)
Call Stack (most recent call first):
/usr/local/Cellar/cmake/3.6.0/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
/usr/local/Cellar/cmake/3.6.0/share/cmake/Modules/FindHDF5.cmake:707 (find_package_handle_standard_args)
CMakeLists.txt:4 (FIND_PACKAGE)
— Configuring incomplete, errors occurred!
See also “/Users/letterbomb/torch/torch-hdf5/build/CMakeFiles/CMakeOutput.log”.
make: *** No targets specified and no makefile found. Stop.
Error: Build error: Failed building.
Hey Jeff ,
Thank you for such a detailed tutorial .
I was able to follow till step 3
cd torch-hdf5 (successful until this part )
luarocks make hdf5-0-0.rockspec —> throws an error cannot satisfy dependency : Torch >= 7 , torch has been installed and i did verify from my terminal .
Any idea why this might be happening ?
I appreciate your help .
Greg – Looks like you didn’t install HDF5 first (see the previous part of that step) or the install of HDF5 didn’t work.
Archy – If you type
th
into the Terminal, what output do you get? If should open an interactive prompt and say “Torch7” somewhere. If you don’t get the prompt, it means Torch wasn’t installed properly, or you didn’t add it to your.bash_profile
(either way, check the step again). If you did get the prompt but it says “Torch6” or something, it means you have the wrong version of Torch installed.Hi again Jeff! Thanks for your quick response. I did check my brew install of hdf5 first, and everything seems fine – it’s so strange:
~ :> brew info hdf5
homebrew/science/hdf5: stable 1.8.16 (bottled)
File format designed to store large amounts of data
http://www.hdfgroup.org/HDF5
/usr/local/Cellar/hdf5/1.8.16_1 (180 files, 10.5M) *
Poured from bottle on 2016-07-10 at 19:41:01
From: https://github.com/Homebrew/homebrew-science/blob/master/hdf5.rb
Only other thing I can think of: is everything stored in the right places (ie following the locations in the tutorial)? Otherwise, sorry! You’ll have to post an issue on the HDF5 repo.
Thanks for this post. I had a few issues when trying to follow the instructions that I’ll mention here for posterity:
First,
bash install-deps
for torch didn’t work for me because SourceForge was having some kind of issue (SHA256 mismatch) that prevented me from downloadinggnuplot-5.3.0
. I had to look around for a mirror of the .tar.gz and put it in Homebrew’s Cellar before installing. This might be a transient issue, but I also had some minor hiccups whenbrew
was dealing with packages I had already installed.Second, even after installing HDF5 with brew (
brew install hdf5
after tapping the science keg) when I tried to runluarocks make hdf5-...
I was getting an error saying that the HDF5 library couldn’t be found. I think I got rid of it by runningbrew install hdf5 --with-mpi
but that might have been a coincidence. In any case, I downloaded the HDF5 source and built it and that solved my problems.Also, I tried using the
-gpu_backend opencl
flag (my MacBook Pro 13″ has Intel Iris–no CUDA!) after installing the torch-cl distribution from https://github.com/hughperkins/distro-cl but it seemed to run slower than just using the CPU. I guess I could have installed an OpenCL luarock but hughperkins’ site told me not to!Thanks for the heads up, Joshua. The
gnuplot
site appears to be down too, but it’s available via Homebrew (brew install gnuplot
). Re HDF5 thanks for the tip! A few others have had issue, so I’ll add the--with-mpi
suggestion. Lastly, I found similar results with CPU and GPU, though on very large datasets I think it can start to pay off.Heya Jeff!
I managed to get to step 7, but when I put in “th train.lua -input_h5 data/tiny_shakespeare.h5 -input_json data/tiny_shakespeare.json” it gave this error:
/Users/xavier/torch/install/bin/luajit: /Users/xavier/torch/install/share/lua/5.1/trepl/init.lua:384: /Users/xavier/torch/install/share/lua/5.1/trepl/init.lua:384: /Users/xavier/torch/install/share/lua/5.1/hdf5/ffi.lua:42: Error: unable to locate HDF5 header file at hdf5.h
stack traceback:
[C]: in function ‘error’
/Users/xavier/torch/install/share/lua/5.1/trepl/init.lua:384: in function ‘require’
train.lua:6: in main chunk
[C]: in function ‘dofile’
…vier/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010b07cd10
I’ve tried looking it up – so far I think it might have something to do with Cuda, but I’m not sure. How do you think I should go about fixing this?
Thanks,
Xavier
Thanks Joshua and Jeff. I can confirm that adding the –with-mpi flag resolved my issue :)
I am facing the following error before training when I run:
th train.lua -input_h5 data/tiny_shakespeare.h5 -input_json data/tiny_shakespeare.json -gpu -1
/Users/User/torch/install/bin/luajit: …User/torch/install/share/lua/5.1/trepl/init.lua:384: …User/torch/install/share/lua/5.1/trepl/init.lua:384: …s/User/torch/install/share/lua/5.1/hdf5/ffi.lua:42: Error: unable to locate HDF5 header file at hdf5.h
stack traceback:
[C]: in function ‘error’
…User/torch/install/share/lua/5.1/trepl/init.lua:384: in function ‘require’
train.lua:6: in main chunk
[C]: in function ‘dofile’
…dhry/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010a048cf0
“In the Train-rnn folder and run the training script”
Where is the Train-rnn folder?
After uninstalling HDF5 (brew uninstall) and reinstalling it again using –with-mpi and then cloning and then executing the training command, I am getting this error:
dyld: lazy symbol binding failed: Symbol not found: _H5_init_library
Referenced from: /usr/local/lib/libhdf5.dylib
Expected in: flat namespace
dyld: Symbol not found: _H5_init_library
Referenced from: /usr/local/lib/libhdf5.dylib
Expected in: flat namespace
Trace/BPT trap: 5
@Sayan: it’s wherever you made it! In step #5 of the tutorial, you download
torch-rnn
, rename the folder, and move it to your Torch folder. That’s where you do all the training and output from.@Sayan: it’s possible this is related to your other question? I’m not sure, but it sounds like it can’t find your HDF5 install.
Hi, first of all thank you for your guide!
I’m facing the same problem described by @Sayan
Albertos-MBP:torch-rnn AlbioTQ$ th train.lua -input_h5 data/tiny_shakespeare.h5 -input_json data/tiny_shakespeare.json
/Users/AlbioTQ/torch/install/bin/luajit: /Users/AlbioTQ/torch/install/share/lua/5.1/trepl/init.lua:384: /Users/AlbioTQ/torch/install/share/lua/5.1/trepl/init.lua:384: /Users/AlbioTQ/torch/install/share/lua/5.1/hdf5/ffi.lua:42: Error: unable to locate HDF5 header file at hdf5.h
stack traceback:
[C]: in function ‘error’
/Users/AlbioTQ/torch/install/share/lua/5.1/trepl/init.lua:384: in function ‘require’
train.lua:6: in main chunk
[C]: in function ‘dofile’
…ioTQ/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010a7fed00
I have tried to uninstall and reinstall hdf5 but the same error persists..
@Sayan have you found a way to make the system see hdf5 ?
Thank you
Hey guys,
I did the best i could do to cover all the steps in the guide, but i got this message while i tried to do ‘STEP 6: Prepare Your Data’ , and it goes like this…
python scripts/preprocess.py –input_txt data/tiny-shakespeare.txt –output_h5 data/tiny_shakespeare.h5 –output_json data/tiny_shakespeare.json
File “scripts/preprocess.py”, line 39
print ‘Total vocabulary size: %d’ % len(token_to_idx)
^
SyntaxError: invalid syntax
any ideas?
its on my macbook os
@Adam: hmm, did you try it with a different text file? I wonder if there’s an issue with your input? It it still doesn’t work, I think this is a
Torch-rnn
issue and you should post it on the Github page.@Xavier: if you don’t have CUDA support, you need to add the flag
-gpu -1
. If that doesn’t do the trick, I think this might be an issue for theTorch-rnn
Github repo – I really don’t know much about Torch or Lua themselves.@sayan there was someone with the same issue here, and they fixed it: https://github.com/deepmind/torch-hdf5/issues/58 . Unfortunately I am running into the same error: (dyld: Symbol not found: _H5_init_library) and reinstalling everything from scratch did not help me.
I figured it out @sayan. Something is weird for the hdf5 installation and the dylib files installed to /usr/local/lib are messed up. I had to install hdf5 from source by going to their website (https://www.hdfgroup.org/ftp/HDF5/current/src/unpacked/release_docs/INSTALL)
Per hdf5 installation instructions, I ran (from the unzipped hdf5 folder)
configure (since I installed openmpi with brew, already):
CC=/usr/local/Cellar/open-mpi/1.10.2_1/bin/mpicc
./configure --prefix=/usr/local/hdf5 --enable-parallel
make
make check
make check-install
cp /usr/local/hdf5 /usr/local/Cellar/hdf5/1.8.17
brew switch hdf5 1.8.17
export DYLD_FALLBACK_LIBRARY_PATH=/usr/local/lib:/usr/lib:/usr/local/hdf5/lib
sudo cp /usr/local/hdf5/lib/* /usr/local/lib/
Then everything magically worked.
Thanks Charles for the fix!
All the steps have worked fine until running the training program. At that step I get this error:
Mariums-MacBook-Pro:torch-rnn mariumsultan$ th train.lua =input_h5 data/Dracula.h5 -input_json data/Dracula.json-gpu-1
/Users/mariumsultan/torch/install/bin/luajit: …/mariumsultan/torch/install/share/lua/5.1/trepl/init.lua:384: …/mariumsultan/torch/install/share/lua/5.1/trepl/init.lua:384: …rs/mariumsultan/torch/install/share/lua/5.1/hdf5/ffi.lua:42: Error: unable to locate HDF5 header file at hdf5.h
stack traceback:
[C]: in function ‘error’
…/mariumsultan/torch/install/share/lua/5.1/trepl/init.lua:384: in function ‘require’
train.lua:6: in main chunk
[C]: in function ‘dofile’
…ltan/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x01031bfcf0
Mariums-MacBook-Pro:torch-rnn mariumsultan$
Any suggestions?
@Marium: this has already been answered in the comments.
I have open-mpi installed but this step does not work “CC=/usr/local/Cellar/open-mpi/1.10.2_1/bin/mpicc ./configure –prefix=/usr/local/hdf5 –enable-parallel”
or this one “sudo cp /usr/local/hdf5/lib/* /usr/local/lib/”
Thanks for this! However I’m getting the hdf5 error running on Mac OSX.
I’m having the same trouble as @Marium. I’ve tried doing
brew install hdf5 --with-mpi
however, when trying to execute @Charle’s fix I cannot run “CC=/usr/local/Cellar/open-mpi/1.10.2_1/bin/mpicc ./configure –prefix=/usr/local/hdf5 –enable-parallel”@charles: Can you check the syntax carefully? It looks like a couple of lines are running together and for some of us it isn’t clear where the break should be. Also, maybe we need to adjust for the current version? Sorry, but a bit more handholding…..
@Jeff: I followed all the steps and read all the solutions in the comments. I have not been able to get past the error in Step 7 that others are also having. Note, I’m not using the CUDA step, so perhaps something there is required and some of us are not running that step?
I’d love to get this running. Please let me know if I can help troubleshoot (though I’m not an expert, I can follow directions). Thanks.
@Ben and Brazuca: I cleaned up @Charles’ code a little to make it easier to read. Does that help? Otherwise, I’d suggest posting an issue with the Torch-rnn repo. Since I don’t really know what’s happening under the hood of the software, not sure how much help I can be!
@Jeff, to solve the issue of missing hdf5.h, I followed the solution here: https://github.com/jcjohnson/torch-rnn/issues/58#issuecomment-239390012
@Jeff: I also think that @Charles’ code is missing a “make install” step. Per the instructions he linked (https://www.hdfgroup.org/ftp/HDF5/current/src/unpacked/release_docs/INSTALL):
$ gunzip < hdf5-X.Y.Z.tar.gz | tar xf –
$ cd hdf5-X.Y.Z
$ ./configure –prefix=/usr/local/hdf5
$ make
$ make check # run test suite.
$ make install
$ make check-install # verify installation.
To add to @Brazuca’s comment, if you have a mac and can’t locate your hdf5.h file in ‘usr/include’, just follow yoosan’s answer in the link below. It’s extremely straightforward and solved the issue for me:
https://github.com/jcjohnson/torch-rnn/issues/58#issuecomment-241186386
So I appear to have fallen at the first hurdle. I run the command
./install.sh
…and get the following message:
Prefix set to /Users//torch/install
Installing Lua version: LUAJIT21
./install.sh: line 59: cmake: command not found
Is there something I need installed for this part to work? I’ve used command line before but all of this is outside my current knowledge / experience enough that I had to install XCode for the first time to get this far. “Far”.
@Evan: not so bad! You installed XCode, but did you install the command line tools for it? I’m guessing that might be it. If you did, you can separately install
cmake
like this: https://cmake.org/install.@Jeff: Installing cmake separately worked, thanks! I got through the rest of the tutorial with only the hd5.h error some others have been having, and managed to train and output with the Shakespeare text, but when I trained Torch-rnn on a different input file then tried to output at various temperatures I got the same few words looping over and over again. I feel like this is something to do with the t7 files, do I have to clear the cv folder every time I want to train something new?
If it’s repeating and not giving good output, your training texts are probably far too short. If it’s giving you Shakespeare when you trained it on something else, it’s probably that the Shakespeare training got further, so those T7 didn’t get overwritten, and you’re using those by accident. Clearing the folder before training on a new set is an easy way to avoid that problem.