Ability to utilise class libraries or external data
Hi there,
Something lacking at present is the ability to use class libraries. It would be great to be able to upload a jar file and access that from your strategy. Likewise the fact that the whole strategy has to be wrapped into a single class file is a serious constraint. Even just being able to load external, previously prepared data from say a csv file would be a step forward. Alternatively, maybe the ability to access such a file from an external website?
Is it possible at all to access external data?
Regards
Comments are currently closed for this discussion. You can start a new one.
Support Staff 2 Posted by Guillaume on 07 Jul, 2010 04:24 PM
Hi,
Since version 1.1.5 of market runner, it is possible to send a strategy based on several classes. from Eclipse, use the menu "Run Remote (send whole project)". This will send you strategy to algodeal as a zip file containing your eclipse project source files (all .java files).
This will not include other non java files (like csv, .properties, etc.). Keep also in mind that your java code will not be allowed to read files on our grid.
If you need to feed you strategy with some additional data of your own, one workaround currently is to generate a java class with constants, and ship this java class with your strategy.
As for external libraries, it is not possible currently to upload your own library. However, we have included some external libraries in market runner when they seemed highly demanded by users (and when they are open enough and can be used without legal issues)
The external libraries currently available are Apache Math, libsvm, ta-lib.
Which library are you thinking of?
Hope this helps,
Guillaume
Guillaume closed this discussion on 07 Jul, 2010 04:24 PM.
skaak re-opened this discussion on 08 Jul, 2010 04:38 AM
3 Posted by skaak on 08 Jul, 2010 04:38 AM
Thanks for the quick reply.
I've put model parameters in a single java file before, but have some
that are now too big for that. The parameters saved as xml comes to
30M and I'm really after accessing that xml file from the server.
The data can be put in multiple java files and I'll probably try that
now - thanks. Is there a limitation on size and number of files when I
upload to the server?
I know you support external libraries but even there one would like to
upload model parameters as a non-java, even binary file at some stage.
Regards
4 Posted by skaak on 08 Jul, 2010 05:44 AM
Whilst on the subject - it would also help to access data on the server.
While running through a strategy, I sometimes create a csv data file
which can be analysed later. This only works locally of course. I note
that locally MarketRunner creates files such as Indicators.csv,
Orders.csv and Transactions.csv. I suppose these are then plotted but
is it possible to download the files themselves from the server?
Thanks
Support Staff 5 Posted by Eric on 08 Jul, 2010 09:24 AM
MG,
There is no limitation on the size of the Java files themselves. At least, none that we have specified.
That said, I'm curious to know what type of data would require 30 Megs. Can't it be calculated as needed by the strategy? Could you expand on that?
Thanks,
Eric
Support Staff 6 Posted by Eric on 08 Jul, 2010 09:29 AM
re: accessing data on the server.
We would like to give access to the files generated by the strategy, but it is not possible. This not a technical problem. The problem is that market data that we have acquired can only be used under very restricted condition. In particular, there must be no way of accessing them by a person that has not acquired a license from our provider.
If indicators values would be obtained, it would be too easy to store market data in them and then download them.
I'm afraid that we are stuck here. Sorry.
Eric
7 Posted by skaak on 08 Jul, 2010 10:38 AM
It is only preliminary and the model is not an Algodeal one, but I'd
like to also transfer and test it on Algodeal at some stage,
especially intraday. It will probably be smaller on Algodeal for end
of day, but likewise much bigger for intraday.
In theory it could be calculated on the fly, but things quickly get
messy. It is better to toy with the model offline until acceptable and
then upload the final one.
This model is a PNN - a nearest neighbour type model, thus the size.
Suppose I calculate clusters and store the centers. If there are
100,000 points - easy for intraday, and this yields 10,000 centers -
easy for PNN, and each center contains 20 points - pretty small, then
I already have 20 x 10,000 = 200,000 points of data.
Storing this in a binary file, using only 8 bytes a double, yields a
roughly 2M file already. Encode that in especially xml and it easily
grows to 30M.
On the data, I can appreciate that you need to prevent abuse. I'd
ideally like to create a pattern file, export it and build a model
locally. Once the model is ready, it gets uploaded and tested online.
I've done this for the series made available offline but would like to
include other series as well. There is some mangling in the patterns
but I suppose you'd not get past the data provider.
This is a different point, but is it possible to draw horizontal lines
on graphs? Suppose the model is built using data from 2000 to 2008 and
then tested from 2008 to 2010. It would be nice to draw a vertical
line to visually separate the two samples.
Regards
8 Posted by skaak on 08 Jul, 2010 10:45 AM
PS : You may know this - java has a limitation that a class can not
yield more than 64k of bytecode, something I discovered while encoding
models destined for Algodeal. This one will probably have to be split
into several files....
Regards
9 Posted by skaak on 08 Jul, 2010 10:54 AM
PSS : Coming to think of it, I still used one file, but split the
model into several classes inside that file, each yielding less than
64k of bytecode. This gets pieced together again when the strategy
runs and I'll probably do the same with the PNN. Messy, but what else.
Regards
MG
Support Staff 10 Posted by Alexandre on 08 Jul, 2010 11:09 AM
MG,
I understand you want to use optimization methods & machine learning. However sending the brute model and executing it against data used to generate it is of little value, because of data snooping.
You are right, there are some limitations on class / method size in terms of bytecode. If you still want to send some data, you have to encode things into java classes as you have been doing.
There is no option to draw lines on the graph currently, however we will roll out the possibility to paper trade your strategy, which will continue backtesting selected strategies with fresh data.
It would be better if you could build the model while backtesting. What library/software are you using to generate your model?
Regards
Alexandre
11 Posted by skaak on 08 Jul, 2010 11:33 AM
Thanks - the software is my own, all java, which is why I like
Algodeal so much!
There are libraries with liberal licenses offering the same
functionality, but I don't think the answer is to build such models on
Algodeal. As mentioned I build the strategies locally and encode and
upload the final one. Algodeal will need serious changes to allow
building e.g. machine learning models and I don't think you want to do
that anyway - or do you?
I split the data up exactly to prevent snooping and this is also why
I'm interested in the recent performance of the models. When you start
throwing capital at models, I think the first question you should ask
is which is in and which is out of sample. It is very easy to get a 9
strategy in sample using any technique because of snooping. But, my
experience at least, it is very difficult to get a 9 strategy out of
sample.
Regards
MG
Support Staff 12 Posted by Alexandre on 08 Jul, 2010 03:51 PM
MG,
Great to see that you are all Java based. We can include libraries into our framework if required (for example we have libSVM currently available but we may get Weka working if that is what you use), so it would be interesting to see what kind of constraints you think are necessary when you mention serious changes to the platform.
We are currently working on a completely revamped score scale, so current scores will not hold anymore. We saw your results and of course they are very very good, in both the old scale and the new scale. However you are right, whatever the scale, you can get great results with optimizations, that is probably what is behind your results.
In order for us to put a model in production, we need to be confident about its performance. There are two complementary ways for us to have some confidence: understanding the inner working of a strategy & out of sample testing and/or paper trading.
Currently your model is a complete blackbox, we don't know what you use as in sample - out of sample data, training sets etc. So a high score does not necessarily mean the model will perform well and be selected. Soon we will be able to run strategy in paper trading, thus giving a good outlook of a strategy resilience. However paper trading, depending on your strategy style, must be conducted on a certain time frame to be relevant.
Regards
Alexandre
13 Posted by skaak on 09 Jul, 2010 05:31 AM
Weka would be nice - others are knime and yale/rapidminer, but I don't
know the license conditions. I use my own, and can embed weka and
libsvm pretty much like you do. To embed, however, I load the libsvm
output file (a text file) or the weka object (a binary file) whilst on
Algodeal I need to encode that into a java file and, if done directly
on the server, I can not access the model. Likewise my own models are
xml translated into objects and I need to embed both object and the
data in java files to prepare for Algodeal.
But let us resolve this for now by embedding it as discussed.
In terms of changes to the platform - I wouldn't recommend nor am I
suggesting that. The platform is for final testing and putting in
production of a strategy. But note that I would not use the remote
platform to develop the strategy. I have my own toolbox of tricks with
which I play locally and I suppose others do the same. If you really
want to go that route, I can give more suggestions and it is doable to
a certain degree, but rather focus on the smooth running of the
strategy than allowing anybody to build all types of strategies. The
trick becomes encoding whatever model was built into java which is
much easier than allowing for all those types of models to be built
and which is more or less the reason this discussion started.
I know the in and out of sample dates of each strategies of course.
Some of them score e.g. 9 and some 8 and I'd rather use the 8 as they
do better out of sample. The models are MLPs and so are black boxes
anyhow. They are nonlinear and can not be decomposed into say 10%
gold, 20% Dow and 30% EUR and 40% MA crossover. But they work well in
practise - even out of sample. Most of my intraday models start out of
sample 2010-01-01 and the end of day ones 2009-01-01 if you want to
investigate, but just mail me if you want to discuss a specific one.
If you need to understand a model before production it raises some
issues. Even if you just know that e.g. MLPs work for the CAC and PNNs
don't, you have a huge advantage and it get's dicey. Likewise, say you
know RSI indicators work for S&P but moving averages don't. Even if
you don't copy a model, you can still use that knowledge to build a
new one. I'm not too uncomfortable with letting you understand broadly
how the model works, but somewhere a line must be drawn. I wouldn't
like to show you how I construct the input patterns for example.
Likewise, because I use my own stuff, I have to put bits of it openly
on the platform to make it work. It is not top secret stuff, but you
have a good reference implementation of an MLP embedded in my
strategies. To some extent I have no choice but to trust you, and of
course I assume you will not use that without my consent. Here too, it
is the discussion of a different thread.
When a strategy goes live, I think it is going to require close
cooperation between Algodeal and the quant. If you simply pick the
best or 9 strategy above, you'd be making a bad decision, and I can
tell you not to do that. If you pick a MLP strategy, I'd like to fine
tune it a bit before it goes live. If it is intraday, it should
probably be tuned a bit once a month even as it runs.
I appreciate your comments but we still have many open ends and you
are welcome to continue, but the original issue is resolved as far as
I am concerned.
Regards
MG
Support Staff 14 Posted by Eric on 13 Jul, 2010 04:25 PM
MG,
An update on the 3rd party libraries.
It turns out RapidMiner has an Affero GPL license, meaning that we cannot distribute it with our Market Runner bundle, nor offer it remotely.
KNIME is released under the GPL license. So, at best, we'd have a similar bundling as with Weka: not offered with our Market Runner download, but accessible remotely. Not very appealing.
Generally speaking, however, I don't think we'll allow strategies to write to the file system in the foreseeable future. There are just too many security risks involved.
Eric
15 Posted by skaak on 14 Jul, 2010 04:48 AM
Thanks Eric,
How about accessing a remote site via http or ftp?
Then I can upload parameters to a server and access it from there,
which may be more convenient than encoding it into a java file, but
maybe not, so don't try too hard on this one.
OK, so the strategy must be a self contained, read only animal. Let's
run with that then.
Regards
MG
16 Posted by skaak on 15 Jul, 2010 11:22 AM
Hi there,
I've managed to generate code to export the PNN as java code. Just a
simple EOD one to start with. The java file contains 2,842 lines and
takes up 5.7Mb. It compiles into 51 different (inner) classes taking
up 4Mb of bytecode.
I really mention this as I ran into another java limitation. A single
class can not contain more than 64k of constants - ever seen the "too
many constants" compiler error? So I had to initialise the PNN in
member classes rather than member functions.
Oh well, it is all automatic, so once done it should work for others as well.
Regards
MG
17 Posted by skaak on 15 Jul, 2010 05:00 PM
After a looooong wait I get a socket ecxeption : connection timed out.
Unable to paste the file into the browser form.
Regards
MG
18 Posted by skaak on 16 Jul, 2010 02:26 AM
Socket keep on timing out, but was able to submit the strategy in cut
and paste fashion using the web interface.
Regards
MG
Support Staff 19 Posted by David on 16 Jul, 2010 06:47 AM
Hi.
Sending a very big strategy should be easier from Eclipse or the command line than from the web interface. Have you tried it ?
David.
20 Posted by skaak on 16 Jul, 2010 08:53 AM
Yes, I thought so too. After several command line attempts I finally
gave up and got the web interface to work by ignoring some script
(source formatter?) that was taking forever.
When I submit via the command line there seems to be very little
network activity and after a looong time it simply times out. The weeb
interface turned out easier and faster than I expected, albeit after
turning off the script.
PS : In terms of some of the stuff we discussed earlier and the new
scoring. I have a strategy that does exceptionally well and gets a
high score, but I wouldn't touch it with a pole given its hold out
period performance. Then I have others with roughly half this score
but they just keep on performing and I'd rather trade them with real
money. Are you going to use the paper trading as a further screening
to ensure you don't pick overfitted strategies?
Regards
MG
Support Staff 21 Posted by Alexandre on 16 Jul, 2010 11:17 AM
MG,
We will use paper as a further screening to check the consistency of strategies. As we discussed already together, whatever the score system, with a bit of optimization it is possible to score high with a strategy that will not perform so well in the future. The score remains a way to filter out strategies, and cannot guarantee that a strategy is resilient.
Paper is a good opportunity for us to check your system's performance, without going into all the details of your strategy, which apparently you want to remain secret.
If you have any further questions, we are here to help.
Regards
Alexandre
22 Posted by skaak on 17 Jul, 2010 07:56 AM
Thanks, I appreciate the help and also try to contribute.
We can go on forever. I overemphasise out of sample and underemphasise
in sample performance - scoring typically does the reverse and is
dangerous, but what else. Paper trading provides some kind of a safety
mechanism.
You should also check if a strategy was lucky or is for real. Often a
strategy is incredibly good on paper but if you look at symmetry you
find it was lucky in say 2002 and 2008 and did not do much anywhere
else. You should have a test for that as well.
If a strategy has say 200 trades, but the bulk of the performance
comes from 5 'lucky' ones, then maybe it is not such a good strategy
as the scoring suggests. This is not obvious from the return
distribution plots and I use a symmetry plot to check - ask for more
info if you're still awake and want details.
Regards
MG
Support Staff 23 Posted by Alexandre on 19 Jul, 2010 02:27 PM
Dear MG,
You are right. There are plenty of things to look at when you evaluate a strategy. We try to make the score a first filter that can help us to select which strategies to look into more details.
Regards
Alexandre
Alexandre closed this discussion on 19 Jul, 2010 02:27 PM.