Ability to utilise class libraries or external data

skaak's Avatar

skaak

07 Jul, 2010 03:47 PM via web

Hi there,

Something lacking at present is the ability to use class libraries. It would be great to be able to upload a jar file and access that from your strategy. Likewise the fact that the whole strategy has to be wrapped into a single class file is a serious constraint. Even just being able to load external, previously prepared data from say a csv file would be a step forward. Alternatively, maybe the ability to access such a file from an external website?

Is it possible at all to access external data?

Regards

  1. Support Staff 2 Posted by Guillaume on 07 Jul, 2010 04:24 PM

    Guillaume's Avatar

    Hi,

    Since version 1.1.5 of market runner, it is possible to send a strategy based on several classes. from Eclipse, use the menu "Run Remote (send whole project)". This will send you strategy to algodeal as a zip file containing your eclipse project source files (all .java files).
    This will not include other non java files (like csv, .properties, etc.). Keep also in mind that your java code will not be allowed to read files on our grid.
    If you need to feed you strategy with some additional data of your own, one workaround currently is to generate a java class with constants, and ship this java class with your strategy.

    As for external libraries, it is not possible currently to upload your own library. However, we have included some external libraries in market runner when they seemed highly demanded by users (and when they are open enough and can be used without legal issues)
    The external libraries currently available are Apache Math, libsvm, ta-lib.

    Which library are you thinking of?

    Hope this helps,
    Guillaume

  2. Guillaume closed this discussion on 07 Jul, 2010 04:24 PM.

  3. skaak re-opened this discussion on 08 Jul, 2010 04:38 AM

  4. 3 Posted by skaak on 08 Jul, 2010 04:38 AM

    skaak's Avatar

    Thanks for the quick reply.

    I've put model parameters in a single java file before, but have some
    that are now too big for that. The parameters saved as xml comes to
    30M and I'm really after accessing that xml file from the server.

    The data can be put in multiple java files and I'll probably try that
    now - thanks. Is there a limitation on size and number of files when I
    upload to the server?

    I know you support external libraries but even there one would like to
    upload model parameters as a non-java, even binary file at some stage.

    Regards

  5. 4 Posted by skaak on 08 Jul, 2010 05:44 AM

    skaak's Avatar

    Whilst on the subject - it would also help to access data on the server.

    While running through a strategy, I sometimes create a csv data file
    which can be analysed later. This only works locally of course. I note
    that locally MarketRunner creates files such as Indicators.csv,
    Orders.csv and Transactions.csv. I suppose these are then plotted but
    is it possible to download the files themselves from the server?

    Thanks

  6. Support Staff 5 Posted by Eric on 08 Jul, 2010 09:24 AM

    Eric's Avatar

    MG,

    There is no limitation on the size of the Java files themselves. At least, none that we have specified.

    That said, I'm curious to know what type of data would require 30 Megs. Can't it be calculated as needed by the strategy? Could you expand on that?

    Thanks,

    Eric

  7. Support Staff 6 Posted by Eric on 08 Jul, 2010 09:29 AM

    Eric's Avatar

    re: accessing data on the server.

    We would like to give access to the files generated by the strategy, but it is not possible. This not a technical problem. The problem is that market data that we have acquired can only be used under very restricted condition. In particular, there must be no way of accessing them by a person that has not acquired a license from our provider.
    If indicators values would be obtained, it would be too easy to store market data in them and then download them.

    I'm afraid that we are stuck here. Sorry.

    Eric

  8. 7 Posted by skaak on 08 Jul, 2010 10:38 AM

    skaak's Avatar

    It is only preliminary and the model is not an Algodeal one, but I'd
    like to also transfer and test it on Algodeal at some stage,
    especially intraday. It will probably be smaller on Algodeal for end
    of day, but likewise much bigger for intraday.

    In theory it could be calculated on the fly, but things quickly get
    messy. It is better to toy with the model offline until acceptable and
    then upload the final one.

    This model is a PNN - a nearest neighbour type model, thus the size.
    Suppose I calculate clusters and store the centers. If there are
    100,000 points - easy for intraday, and this yields 10,000 centers -
    easy for PNN, and each center contains 20 points - pretty small, then
    I already have 20 x 10,000 = 200,000 points of data.

    Storing this in a binary file, using only 8 bytes a double, yields a
    roughly 2M file already. Encode that in especially xml and it easily
    grows to 30M.

    On the data, I can appreciate that you need to prevent abuse. I'd
    ideally like to create a pattern file, export it and build a model
    locally. Once the model is ready, it gets uploaded and tested online.
    I've done this for the series made available offline but would like to
    include other series as well. There is some mangling in the patterns
    but I suppose you'd not get past the data provider.

    This is a different point, but is it possible to draw horizontal lines
    on graphs? Suppose the model is built using data from 2000 to 2008 and
    then tested from 2008 to 2010. It would be nice to draw a vertical
    line to visually separate the two samples.

    Regards

  9. 8 Posted by skaak on 08 Jul, 2010 10:45 AM

    skaak's Avatar

    PS : You may know this - java has a limitation that a class can not
    yield more than 64k of bytecode, something I discovered while encoding
    models destined for Algodeal. This one will probably have to be split
    into several files....

    Regards

  10. 9 Posted by skaak on 08 Jul, 2010 10:54 AM

    skaak's Avatar

    PSS : Coming to think of it, I still used one file, but split the
    model into several classes inside that file, each yielding less than
    64k of bytecode. This gets pieced together again when the strategy
    runs and I'll probably do the same with the PNN. Messy, but what else.

    Regards
    MG

  11. Support Staff 10 Posted by Alexandre on 08 Jul, 2010 11:09 AM

    Alexandre's Avatar

    MG,

    I understand you want to use optimization methods & machine learning. However sending the brute model and executing it against data used to generate it is of little value, because of data snooping.
    You are right, there are some limitations on class / method size in terms of bytecode. If you still want to send some data, you have to encode things into java classes as you have been doing.
    There is no option to draw lines on the graph currently, however we will roll out the possibility to paper trade your strategy, which will continue backtesting selected strategies with fresh data.

    It would be better if you could build the model while backtesting. What library/software are you using to generate your model?

    Regards

    Alexandre

  12. 11 Posted by skaak on 08 Jul, 2010 11:33 AM

    skaak's Avatar

    Thanks - the software is my own, all java, which is why I like
    Algodeal so much!

    There are libraries with liberal licenses offering the same
    functionality, but I don't think the answer is to build such models on
    Algodeal. As mentioned I build the strategies locally and encode and
    upload the final one. Algodeal will need serious changes to allow
    building e.g. machine learning models and I don't think you want to do
    that anyway - or do you?

    I split the data up exactly to prevent snooping and this is also why
    I'm interested in the recent performance of the models. When you start
    throwing capital at models, I think the first question you should ask
    is which is in and which is out of sample. It is very easy to get a 9
    strategy in sample using any technique because of snooping. But, my
    experience at least, it is very difficult to get a 9 strategy out of
    sample.

    Regards
    MG

  13. Support Staff 12 Posted by Alexandre on 08 Jul, 2010 03:51 PM

    Alexandre's Avatar

    MG,

    Great to see that you are all Java based. We can include libraries into our framework if required (for example we have libSVM currently available but we may get Weka working if that is what you use), so it would be interesting to see what kind of constraints you think are necessary when you mention serious changes to the platform.

    We are currently working on a completely revamped score scale, so current scores will not hold anymore. We saw your results and of course they are very very good, in both the old scale and the new scale. However you are right, whatever the scale, you can get great results with optimizations, that is probably what is behind your results.

    In order for us to put a model in production, we need to be confident about its performance. There are two complementary ways for us to have some confidence: understanding the inner working of a strategy & out of sample testing and/or paper trading.

    Currently your model is a complete blackbox, we don't know what you use as in sample - out of sample data, training sets etc. So a high score does not necessarily mean the model will perform well and be selected. Soon we will be able to run strategy in paper trading, thus giving a good outlook of a strategy resilience. However paper trading, depending on your strategy style, must be conducted on a certain time frame to be relevant.

    Regards

    Alexandre

  14. 13 Posted by skaak on 09 Jul, 2010 05:31 AM

    skaak's Avatar

    Weka would be nice - others are knime and yale/rapidminer, but I don't
    know the license conditions. I use my own, and can embed weka and
    libsvm pretty much like you do. To embed, however, I load the libsvm
    output file (a text file) or the weka object (a binary file) whilst on
    Algodeal I need to encode that into a java file and, if done directly
    on the server, I can not access the model. Likewise my own models are
    xml translated into objects and I need to embed both object and the
    data in java files to prepare for Algodeal.

    But let us resolve this for now by embedding it as discussed.

    In terms of changes to the platform - I wouldn't recommend nor am I
    suggesting that. The platform is for final testing and putting in
    production of a strategy. But note that I would not use the remote
    platform to develop the strategy. I have my own toolbox of tricks with
    which I play locally and I suppose others do the same. If you really
    want to go that route, I can give more suggestions and it is doable to
    a certain degree, but rather focus on the smooth running of the
    strategy than allowing anybody to build all types of strategies. The
    trick becomes encoding whatever model was built into java which is
    much easier than allowing for all those types of models to be built
    and which is more or less the reason this discussion started.

    I know the in and out of sample dates of each strategies of course.
    Some of them score e.g. 9 and some 8 and I'd rather use the 8 as they
    do better out of sample. The models are MLPs and so are black boxes
    anyhow. They are nonlinear and can not be decomposed into say 10%
    gold, 20% Dow and 30% EUR and 40% MA crossover. But they work well in
    practise - even out of sample. Most of my intraday models start out of
    sample 2010-01-01 and the end of day ones 2009-01-01 if you want to
    investigate, but just mail me if you want to discuss a specific one.

    If you need to understand a model before production it raises some
    issues. Even if you just know that e.g. MLPs work for the CAC and PNNs
    don't, you have a huge advantage and it get's dicey. Likewise, say you
    know RSI indicators work for S&P but moving averages don't. Even if
    you don't copy a model, you can still use that knowledge to build a
    new one. I'm not too uncomfortable with letting you understand broadly
    how the model works, but somewhere a line must be drawn. I wouldn't
    like to show you how I construct the input patterns for example.

    Likewise, because I use my own stuff, I have to put bits of it openly
    on the platform to make it work. It is not top secret stuff, but you
    have a good reference implementation of an MLP embedded in my
    strategies. To some extent I have no choice but to trust you, and of
    course I assume you will not use that without my consent. Here too, it
    is the discussion of a different thread.

    When a strategy goes live, I think it is going to require close
    cooperation between Algodeal and the quant. If you simply pick the
    best or 9 strategy above, you'd be making a bad decision, and I can
    tell you not to do that. If you pick a MLP strategy, I'd like to fine
    tune it a bit before it goes live. If it is intraday, it should
    probably be tuned a bit once a month even as it runs.

    I appreciate your comments but we still have many open ends and you
    are welcome to continue, but the original issue is resolved as far as
    I am concerned.

    Regards
    MG

  15. Support Staff 14 Posted by Eric on 13 Jul, 2010 04:25 PM

    Eric's Avatar

    MG,

    An update on the 3rd party libraries.

    It turns out RapidMiner has an Affero GPL license, meaning that we cannot distribute it with our Market Runner bundle, nor offer it remotely.

    KNIME is released under the GPL license. So, at best, we'd have a similar bundling as with Weka: not offered with our Market Runner download, but accessible remotely. Not very appealing.

    Generally speaking, however, I don't think we'll allow strategies to write to the file system in the foreseeable future. There are just too many security risks involved.

    Eric

  16. 15 Posted by skaak on 14 Jul, 2010 04:48 AM

    skaak's Avatar

    Thanks Eric,

    How about accessing a remote site via http or ftp?

    Then I can upload parameters to a server and access it from there,
    which may be more convenient than encoding it into a java file, but
    maybe not, so don't try too hard on this one.

    OK, so the strategy must be a self contained, read only animal. Let's
    run with that then.

    Regards
    MG

  17. 16 Posted by skaak on 15 Jul, 2010 11:22 AM

    skaak's Avatar

    Hi there,

    I've managed to generate code to export the PNN as java code. Just a
    simple EOD one to start with. The java file contains 2,842 lines and
    takes up 5.7Mb. It compiles into 51 different (inner) classes taking
    up 4Mb of bytecode.

    I really mention this as I ran into another java limitation. A single
    class can not contain more than 64k of constants - ever seen the "too
    many constants" compiler error? So I had to initialise the PNN in
    member classes rather than member functions.

    Oh well, it is all automatic, so once done it should work for others as well.

    Regards
    MG

  18. 17 Posted by skaak on 15 Jul, 2010 05:00 PM

    skaak's Avatar

    After a looooong wait I get a socket ecxeption : connection timed out.
    Unable to paste the file into the browser form.

    Regards
    MG

  19. 18 Posted by skaak on 16 Jul, 2010 02:26 AM

    skaak's Avatar

    Socket keep on timing out, but was able to submit the strategy in cut
    and paste fashion using the web interface.

    Regards
    MG

  20. Support Staff 19 Posted by David on 16 Jul, 2010 06:47 AM

    David's Avatar

    Hi.

    Sending a very big strategy should be easier from Eclipse or the command line than from the web interface. Have you tried it ?

    David.

  21. 20 Posted by skaak on 16 Jul, 2010 08:53 AM

    skaak's Avatar

    Yes, I thought so too. After several command line attempts I finally
    gave up and got the web interface to work by ignoring some script
    (source formatter?) that was taking forever.

    When I submit via the command line there seems to be very little
    network activity and after a looong time it simply times out. The weeb
    interface turned out easier and faster than I expected, albeit after
    turning off the script.

    PS : In terms of some of the stuff we discussed earlier and the new
    scoring. I have a strategy that does exceptionally well and gets a
    high score, but I wouldn't touch it with a pole given its hold out
    period performance. Then I have others with roughly half this score
    but they just keep on performing and I'd rather trade them with real
    money. Are you going to use the paper trading as a further screening
    to ensure you don't pick overfitted strategies?

    Regards
    MG

  22. Support Staff 21 Posted by Alexandre on 16 Jul, 2010 11:17 AM

    Alexandre's Avatar

    MG,

    We will use paper as a further screening to check the consistency of strategies. As we discussed already together, whatever the score system, with a bit of optimization it is possible to score high with a strategy that will not perform so well in the future. The score remains a way to filter out strategies, and cannot guarantee that a strategy is resilient.

    Paper is a good opportunity for us to check your system's performance, without going into all the details of your strategy, which apparently you want to remain secret.

    If you have any further questions, we are here to help.

    Regards

    Alexandre

  23. 22 Posted by skaak on 17 Jul, 2010 07:56 AM

    skaak's Avatar

    Thanks, I appreciate the help and also try to contribute.

    We can go on forever. I overemphasise out of sample and underemphasise
    in sample performance - scoring typically does the reverse and is
    dangerous, but what else. Paper trading provides some kind of a safety
    mechanism.

    You should also check if a strategy was lucky or is for real. Often a
    strategy is incredibly good on paper but if you look at symmetry you
    find it was lucky in say 2002 and 2008 and did not do much anywhere
    else. You should have a test for that as well.

    If a strategy has say 200 trades, but the bulk of the performance
    comes from 5 'lucky' ones, then maybe it is not such a good strategy
    as the scoring suggests. This is not obvious from the return
    distribution plots and I use a symmetry plot to check - ask for more
    info if you're still awake and want details.

    Regards
    MG

  24. Support Staff 23 Posted by Alexandre on 19 Jul, 2010 02:27 PM

    Alexandre's Avatar

    Dear MG,

    You are right. There are plenty of things to look at when you evaluate a strategy. We try to make the score a first filter that can help us to select which strategies to look into more details.

    Regards

    Alexandre

  25. Alexandre closed this discussion on 19 Jul, 2010 02:27 PM.

Comments are currently closed for this discussion. You can start a new one.