Use SEPTA bus gps data to ultimately determine when a bus will reach a certain stop.

Project Activity

Update #13

Been working on in general debugging and performance issues. Finalized training against traffic data. For this I used google's directions api along the route and then trained against what google thought the duration time would be. Will take a couple of weeks to collect a minimal amount of data to train against for the main server though.

At the moment I have given up on neural networks. They seemed to not perform as well and were hard to tweak. If anyone else wants to try different algorithms please let me know and I can give them a csv file of some data.

Next I will probably work on performance graphing of the training process. The end goal is to have graphs etc on the site so anyone can easily scroll through all the routes etc and see how well things were able to train / predict etc.

Update #12

Updated the demo app so it is more of an end user app geared towards the phone. This is the first app that can get the predictions for any bus on any route. Will probably work on getting the second demo more user friendly next. If you have any suggestions or bugs please post them on the forum.

Anyone know how to get links to be links?

Update #11

Reworked the project site so it is now looks like a real project. Includes links to the demo pages and also has a forum if people want to discuss feature ideas or how to use etc -

Finished training all the routes on the public server (which took 5 days straight). Also finished the preprocessing / caching of predictions so the front end calls are now a lot faster. Found that the main server is slightly slow so I just upgraded mysql from 5.1 to 5.5 and from xen to kvm so hopefully that speeds it up some. Might need to upgrade to a slightly faster VPS machine but will avoid unless necessary as that costs extra.

Will now probably start on a basic web mobile app meant for actual use instead of just demoing.

Update #10

Uploaded "nearby to" api. This lets you select a from and to location and it will return the next buses that will take you to the destinations and predictions on when it will get to all the stops. At the moment the call is very slow so that is the next thing that will be worked on.

demo app -

Update #8

Predict and next_bus api are now working. Predict gives you the amount of seconds predicted for a bus to reach a certain stop. Next bus gives you all the next buses to reach a stop for each route and the predicted amount of time for them to get there. Just made it so predict and next_bus run within 1 second total, down from ~10 seconds per bus. Next will change the trainer to use number-of-stops-away instead of nearest-stop which should make training a lot faster and more accurate.

These api are available however currently only routes 9, 21, and 42 are trained against stop 6060 (15th and chestnut)

Also there is this demo app -

Spoke with Lloyd and Andrew about working on the website and a "nearby buses to" api.

Update #7

Stuck in a lot of new features - schedule, nearest stop, block id, bus count, destination, direction, and refined how it collects training data. This brought the mean squared error from 3,000,000 to 1,644 in one case which means it was estimating the arrival time roughly under a minute on average.

The weather feature is done on the training side and just needs implemented on the front end side. Also the project now utilizes GTFS data. Next is to begin running everything on a public server and test and debug live issues.

Update #6

Implemented a c++ version of the linear regression algorithm. Fixed the bug that was causing the new learning features to break the algorithm. Next things are more learning features and making the c++ trainer faster.

Update #5

Brought in more features to train against and brought down the average squared error. Finding that using octave as a trainer may be limited and need replaced.

Update #4

The entire pipeline is now complete. The program is now able to collect data for all routes and train against all stops. Currently it only considers lat and lng which is not enough to make a good prediction. Will probably continue on improving the prediction focusing on route 21.

Currently route 21 and stop 14907 is trained -

Update #2

The code now stores data in mysql, prepares the training data, trains with the training data, and parses the result. The front end API is already done. The last remaining step is to store the results into mysql and modify the php front end to use it.

Update #1

Got the remaining parts of the proof of concept running from (6?) months ago. Put the code in mercurial on sourceforge and built a small landing page for the project. Should be able to begin planning out / contributing new work from here out.