AI Writing Intensive Journal

This is a

David Bond

Week 14

5/5/091 PM

For this final journal I would once again like to talk about my project. This week has been interesting in that while last week I said BRTA* was being beat by single look ahead RTA* I found out this was patently wrong. On the very simple example I was using it was being out performed but in general BRTA* well out performs RTA*. For instance with one expansion per tick BRTA* does the example problem in 532 moves. RTA* with single look ahead does the same in 996. With two expansion BRTA does it in 292 moves, with three, 194 moves, and the optimal path length is 72.

The majority of my work this week has been spent on implementing a harness which allows testing on massive amounts of data. Currently I can run on the War Craft 3 maps given. Here is some example output which I need to convert to a CSV file format soon so I can start producing graphs.

RESULT< alg:BRTA*, cl:1, exp:2000, gen:7200, ct:25, pl:1000, opl:127 >
RESULT< alg:BRTA*, cl:11, exp:2508, gen:8948, ct:25, pl:305, opl:127 >
RESULT< alg:BRTA*, cl:21, exp:2486, gen:8862, ct:25, pl:209, opl:127 >
RESULT< alg:BRTA*, cl:31, exp:2407, gen:8594, ct:25, pl:177, opl:127 >
RESULT< alg:BRTA*, cl:41, exp:2436, gen:8681, ct:25, pl:173, opl:127 >
RESULT< alg:BRTA*, cl:51, exp:2444, gen:8710, ct:25, pl:161, opl:127 >
RESULT< alg:BRTA*, cl:61, exp:2542, gen:9075, ct:25, pl:165, opl:127 >
RESULT< alg:BRTA*, cl:71, exp:2419, gen:8631, ct:25, pl:147, opl:127 >
RESULT< alg:BRTA*, cl:81, exp:2542, gen:9075, ct:25, pl:147, opl:127 >
RESULT< alg:BRTA*, cl:91, exp:2418, gen:8628, ct:25, pl:139, opl:127 >
RESULT< alg:RTA*, cl:1, exp:3000, gen:6072, ct:25, pl:1000, opl:127 >

It is interesting to see the reduction of the path length towards optimal as the computation limit increases.

The other area that I have been working a lot on this week is implementing RTA* which in fact has look ahead. I have been having trouble with this but I am close to getting a solution. I am doing the A* look ahead with the heuristic estimate of the second best node at the start of the open list, whose starting action is not the same as the head of the open list’s starting action, as the heuristic estimate to be stored for this node. I have some implementation issues right now and so once I get those worked out I can get more details on comparison of BRTA* and RTA*.

One final side note is for a while I didn’t understand why the TBA* paper was calling LRTA* as the algorithm it was using. I finally realized that LRTA* can be use on multiple runs of itself but it also can be used on single runs with a slightly altered RTA* F function.

Week 13

4/28/092 AM

For this week’s journal I want to discuss more of my results from this week. First I read the RTA* paper and then I implemented it in the framework I have in place. The results after I implemented this were a bit disappointing. While I only ran both algorithms on one example the RTA* took around 80 steps while the BRTA* took over 100 steps. The worst part is my RTA* implementation is the naïve form which just expands one step of the state space and then commits to an action. This implies the real form of RTA* which does an A* search as far as it can in the time allotted, will have an even shorter resultant path. I’m thinking this could be caused by not resorting the open list when the backward goal changes after each expansion. This may be causing many more expansions than is needed to come up with a solution.

The next problem I have been considering is how to use RTA*’s back tracing methodology to deal with back tracing in BRTA*. I cannot simply save a heuristic value of the second best choice since in BRTA* the goal state is constantly changing. I’m thinking the best solution to this is save all the nodes not chosen and then when the heuristic value needs to be retrieved ArgMin will be run over all nodes to get the heuristic estimate for back tracing.

The final problem I have relates back to RTA*’s back tracing methodology when applied to use of A* in the exploration phase. I want to make sure I understand how to compute the second best heuristic value in this case. What I am going to do is run A* for a short period and then iterate over the left over open list. For the first element in the open list that is not the first element on the open list and that comes from an initial step not equal to the initial step taken by the head of the open list’s first step, the heuristic estimate of this node will be the adopted second best value.

Week 12

4/21/091 PM

For this week's journal I would like to first consider the topic of Reinforcement learning versus supervised learning. I don't get how these two classes of problems are any different accept reinforcement learning in concept is done in real time while supervised learning has a learning phase where things may be computed. Isn't reinforcement learning the same as supervised learning where the set of training data is the reward/state tuples gained so far? What would be the reasoning to separate these two forms of machine learning unless it is because identifying features of a complex state might be very hard?

The next thought I would like to go over relates to what we were talking about when I came to your office hours last week. In the domain of real time heuristic search where the backward real time search I am working on is getting stuck in dead ends. I have not been able to test this yet but given our discussion it seems clear it is valid to mark a square as a dead end if it only has one generated node and that node has a higher heuristic value. Like you pointed out if there is more than one expanded state the agent can get stuck. I am just afraid that only marking perfectly valid ones as dead ends will cause the agent to get stuck in cycles. I like how my implementation, which currently ignores dead end squares if no action can be taken, tends to go into a cave like structure in the search space and then exit with the ignores clause when it has exhausted that space. My one thought it the agent may still get stuck in a loop if the search frontier is not recommending a quick exit. Anyways, I need to test this more thoroughly and I would like to make comments on different dead end resolution strategies in my final report. I also need to look into the effect of not resorting the open list.

Week 11

4/14/0910 AM

This week I am a little bit excited about writing the journal since I had an interesting time working on my project. So I began by making a nice little text GUI which framed any movements given by the real time algorithm. This was interesting since it allowed me to watch how the algorithm was in fact progressing. I then implemented Backward A* Real Time Search and watched it successful navigate over a blank map. I then added some obstacles and it did well on those also, until it met with a dead end. Consider the following map:

*__12#_________#____##__#_#___#__#__#__X
##__##__#______#_#_#___#_#___#_#________
_#________#_____###_#___##_________#__#_
_#_____#_____________________#______#___
#___##__#______##_##_____#__#_##_##____#

The characters mean as follows: * = the agent, _ = an open square, #= a blocked square, and X is the agent’s destination. I have also labeled 1 and 2 which are two blank squares. The agent will start moving toward 2 but once it moves to 2 it will move back to 1. When at one it will move back to two and so forth. This is because tile two has an H value lower than that of tile one. The agent will still reach its destination since the search frontier will finally reach the agent but having a unit iterate back for forth for a while is unacceptable for most applications of real time search. I “solved” this problem by implementing what I call a dead end list. If, when a state is expanded just before the agent moves, the H values of all its children are greater than its H value; this node is added to the list. States in this list are excluded from being taken. The fact that the head of the open list constantly changes makes it so an agent can become stuck in a dead end and have no actions to take. To solve this problem I did the hack where if all generated states are on the dead end list then the dead end list is ignored. I want to find a better way to solve this problem, since while it seems to work well I intuitively sense an agent may still apparently get stuck at times.

The next step in my project is to implement some other real time search algorithms. I have been unable to locate the original LRTA* paper by Korf. I will be coming by your office hours today to see if you know where I can get a copy of it. I know this algorithm uses a method to update the heuristic values as the state space is explored. I am thinking this might be applicable to the above problem.

Week 9

3/31/091 PM

For this week’s journal the first comment I would like to make is in regards to the size of the state space we must iterate through. In class you made the comment that one does not have to expand out every possible state in the state space, since for instance you being a child in India will never happen, but isn’t that the same as saying one must expand the entire search space? Consider a grid world type problem. The state space is huge but many states could be impossible to reach. This does not mitigate the fact that the number of states that can be reached is still enormous. It would seem to just divide the problem by a constant factor, which doesn’t help significantly on exponential problems.

With this in mind an idea comes up. For instance, in value iteration one is only looking for a max. That means if one could detect one line of search as useless because its expected utility is quite low, then what is the point of fully expanding it until it converges. It may be better to just continue expanding the promising nodes. So, for instance, if avg(U(s) for all s in states) – U(S’) > epsilon then exclude U(S’) from future policy iterations. This could greatly reduce the size of the tree being iterated through while perhaps keeping the solution quality pretty good since the states which look good are the ones that are still being expanded. The value epsilon could be a preconfigured argument or perhaps something such as the standard deviation of the utility of states. It could possibly be a good idea to add some states previously excluded from the value iteration if those states start looking good again, if say their differences lie within the epsilon bound again.

Week 8

3/24/096 AM

The item that I would like to discuss for this journal is the backwards real time search algorithm you invented and then shared with me the other day. This is a major part of my project proposal and so I want to make sure I can recount it properly and in the way you described it to me. So what I recall your algorithm doing in a real time search situation. The algorithm starts some sort of search on the goal node state. Any algorithm can be used for this but it is most likely based on A*. This search is preformed until some limit is reached such as a time limit or node expansion limit at which time the search algorithm is saved but halted. Next the current state is expanded once to produce a set of nodes. The heuristic value “distance” is then computed between each of the nodes in this set and the head of the search’s open list. With this computed the lowest value node is returned and executed.

At this point the algorithm gets even more interesting. One time step has now occurred and the current state has changed. The algorithm can now commence once again but before this happens the heuristic values of the nodes on the open list may want to be recomputed. Or than again, maybe it isn’t worth recomputing the values of these nodes. This is one of the major aspects of the algorithm to investigate in the project. This is continued until the open list search frontier intersects with the current state. At which a plan has been found and can be simply executed. Anyways are there any aspects of that algorithm you presented that I am either misunderstanding or that I am missing? I will be including this description amongst other topics in my project proposal.

Week 7

3/10/097 AM

This week, the first topic that I would like to discuss in on my STRIPs domain encoded for the milestone submission. When I was working on this assignment I had a map of the European Railroad system open and so the idea of encoding this into STRIPs came into my mind. It was a simple way to create easily understandable routes. I went about encoding this and in the end I had around 50 constants, 5 actions, and 100 starting truths. When I ran this through my solution for the first time, the magnitude of what I had just written appeared. Given that I had 50 constants that meant I had 5*50¬^2 actions, or 12,500 actions. My solution wasn’t returning and so I turn on debugging output I was quite surprised to expansions taking 10 or so seconds. While the solution may have been attainable by expanding say 500 nodes the fact that it was taking so long to work with the grounded variables meant I was only reaching around 50 before I canceled the program. Keep this in mind cause I want to bring this up one more time.

My second note in relation to the assignment has to do with the reuse of code. I was delighted to find how simple this assignment was when previous code was reused. I had to write a little bit of parsing code, which used my CNF Grammar parsing from assignment 2. I then added grounding to the CNF Grammar elements and I implemented a few data structures such as the STRIPs operator, goal, heuristics, and dynamic state. When I plugged all of this into the existing framework it took a few minutes of debugging and my solution was working quite to my surprise. I felt like this was a great example of why a generic initial design on any project can be helpful.

Going back to my first note I mentioned that the program was taking so long to return a very simple solution. I want to relate this to real time search. The use of node expansions for this seems a bit questionable. Perhaps one would know if their domain takes too long with node expansions but wouldn’t a strict time limit be a better way to limit the search? In my assignments so far I have noticed the node expansion rate speeds up or slows down drastically in relation to the size of the open/closed lists.

Next, in path finding in general I have been thinking about a relatively simple heuristic. In many games one has a map ahead of time. One could do an offline computation of the actually cost matrix for the cost from any square to any square. This is a huge computation but still relatively simple to do offline. This could be saved to a file for use later. With this computed a program could use this as a heuristic for a real time algorithm that has changing elements such as other units moving or new obstacles appearing. Have you seen any experimental results using this?

My final note for this week relates to the paper, Real-Time Adaptive A* by Sven Koenig and Maxim Likhachev. This paper presents a real time search algorithm which computes A* for a limited period each time tick but updates the heuristic algorithm based on facts learned about the existing domain. I hate to put too much weight on my comments since I have not finished their paper yet but my initial thought relates to the use of memory in this algorithm. It seems saving all the information about g and h values could take quite a bit of memory. Real time search algorithms need not only consider time as limited, but in fact memory is often also limited. Once I finish this paper I want to look into the space complexity more but I am pretty sure it will be exponential which could be a problem in many applications.

Week 6

3/1/0912 AM

Just having read the paper “Fast and Memory-Efficient Multi-Agent Pathfinding” by Ko-Hsin Cindy Wang and Adi Botea, my first thought is the paper will take a number of additional readings before I even start to get a firm enough grasp on the topic to start an implementation. But, anyways, to give a high level overview, the paper suggests a new cooperative pathfinding algorithm. This is done by first computing a new search graph which limits the grid world state space by creating one way routes per row and column in such a way that the connectivity of the map does not decrease at all. With this done an A* search is run for each unit over the computed new search graph. The resulting plans are then executed and in real time when a collision between units is detected a quick local search for reroute will be computed.

As I mentioned I still need to look into this algorithm more too fully understand it but there were several things about the results that makes me want to investigate further. First of all WHCA* was presented as the next best alternative and in the graphs given FAR absolutely demolishes this existing algorithm. It would be interesting to re-implement the algorithm and make sure it was not a feature of the domain which made FAR well suited. IE did the examples have the units moving in the same direction? How does changes in things like this effect the algorithm? How does the FAR algorithm work on different maps as well? I would also like to know how much time pre-computing and storing the new search space would save.

I hope to re-read this paper for next week and also read the WHCA* paper so I can better understand how FAR is so much “better.” Reference to Paper

Week 5

2/24/093 AM

In my past journal entries I have put up a number ideas for projects but as the project proposal is due right after spring break I thought it would be a good idea to put down what I think is my best project idea to get your ideas on it. Previously I mentioned an idea about computing routes for traffic on roads. On retrospective I think the ideas presented there were perhaps too broad in scope and I was not able to find much real world data. I am thinking at a more generalized but simplified look at this problem would be the way to go. I would like to look at cooperative path finding

There are three related applications to cooperative path finding that I have been considering. Firstly, the traffic problem already mentioned. Secondly, and perhaps the most obvious being a grid world system that computes routes for multiple agents. Lastly, a non-traditional application of competitive path finding into routing networks when certain quality of service is desired. I am thinking for the paper I could first take a look at these; perhaps find some additional applications of cooperative path finding.

I would then take a grid world example where a world is read in along with a set of movement destination and implement a cooperative path finding system. It would compute the globally optimal set of paths for all the agents to take. Since this would be a form of search the key challenge here would be creating the heuristic based on existing research and my experimentation. I wouldn’t be as much focusing on original ideas but implementations and optimizations of those algorithms. I will see what I can come up with through the implementation and then write about what I see in my paper. I would also like to consider elements such as selecting the non-optimal solution and updating the global plan in the middle of its execution.

Week 4

2/16/099 PM

This week I want to discuss a domain I was considering during the minimax tree search lecture. So in Real Time Strategy games AIs are to say the least pitiful. There has yet to be an AI that is even a challenge to an amateur gamer. This is perhaps related to the enormous state space of this type of game but still one would think we could do better. One specific area in this type of game were the AI is especially bad is the area of micromanagement of units. This means the tactical movements, choices of whom to attack, where to move on the battle field, and so forth. I would like to take a moment to formalize this problem and then make some additional commentary.

Consider a battle or situation with three types of units: A, B, and C. Each team has N of these units respectively. The area for this battle is an H times W grid. Time shall be divided into 30 discrete slices a second and the battle shall go on for 60 seconds. With this information we can see the complexity of the state space. Each tick each unit has many different moves. Assuming that a unit can move in eight directions that means a unit has 8 + 3*N + 1 moves possible each tick. The 3*N is for units the can choose to attack and the +1 is for possibility of doing no actions with this unit. Furthermore, ever unit in that player’s control must be processed. If this was a search tree out complexity for the state space would be 3*N*(3*N+9)^(60*30). To put that in perspective if we assume an N of 10 and we change the formula from 60 seconds to 6 seconds the result is 7.3918E+287. That number is to say the least, um, huge. And more importantly it is only a six second look ahead. In real micromanagement situations six seconds could be taken up by a simple withdrawal to reconstitute the center of mass, something which a six second look ahead I see as having trouble seeing the value off, but which humans can recognize easily.

In class I wrote one of my end of class questions on this and your response was simply a minimal look ahead of 1 to 2 turns with minimax. The issue I see with this in that a 1-2 turn look ahead would have so little bearing on the entire problem. This brings me to my main thrust. On problems like this it seems the only reasonable solution is an enormous hierarchical simplification of the problem domain. That then brings up the statement you have made in class. You have repeatable said just let the algorithm do its work. A simplification seems too rely on the programmer guiding the program to a solution. But I honestly don’t see any other way on such a domain. Does this ring a bell with any formal papers you have read? I’d be interested in seeing them and how others have approached such problems.

I was also going to comment on another topic but this is a getting long already so I will leave that for another time. I do want to mention one nice thing about the first assignment. I remember a while back I was stuck in LAX for 10 hours and I was board and so I began making a Sudoku solver. At the time I considered how easy it would be with a recursive function but I decided against it because of how much replication of the Sudoku grid that would have to be done. I was able to make a simple manner to solve this but after this assignment I think I have a new perspective on just allowing the computer to do its magic.

Week 3

2/9/0911 PM

In this week’s journal I want to start off talking about MAC. I was the person who asked if MAC is helpful on certain domains more so than others, but let me rephrase and extend what I was meaning by this a bit. Certain domains of problems may have characteristics like being highly and sparsely connected. From an intuitive standpoint it seems highly connected graphs will allow greater simplification of the problem. Now, that I am writing this, however, the thought that highly connected graphs are simply more complicated, harder problems comes to mind. Either way, it seems the additional processing of MAC would be more useful on highly connected problems. More highly connected graphs led towards more conflicts between nodes and more elimination of arcs in the long run. Is this a correct statement?

My next thought once again relates to MAC. Consider the following graph coloring problem. Stay there are nodes A, B, C, D and the arcs A-B, A-C, B-D, C-D, and B-C exist in the problem. If one choose r to color A, MAC specifies g,b is the domain for B and C while r, g, b is the domain for D. Furthermore, with simple logic one can see that because B and C can only be g and b and because the arcs B-D and C-D both exist then D can only be r. This is easy to see as a human but perhaps this could be added to a MAC consistency check. This would be something like for every size n set of nodes that have domains of size n if there exists any nodes with arcs to all n nodes then all such nodes should have their domains reduced accordingly. This seems like a very intensive process, but it would be interesting to see if it helps prune certain problems.

Week 2

2/2/0911 PM

In this week’s journal entry the first topic I would like to bring up is a potential project idea. Over the years travelling to and from places in the car I am always thinking about optimization of the route I take. With the possibility of car’s driving themselves in the future one wonders if the future holds a macro planning system that would plan the routes and speeds of cars. Here is the idea for the project: I could work on such a macro planning system which you feed a real world road map into and a set of time, start location, and destination tuples representing car trips. This system could then plan the globally optimal route for all the cars entered into the system to take along with the best speeds to travel within the given speed limits.

The topic of constraint satisfaction problems could also be feed into this project. The set of tuples for car trips would have to be quite significant. Perhaps this could be auto generated off of data gathered by the road sensors that measure the amount and time of traffic going over a certain point in the road. These could be used as the constraints in the constraint satisfaction problem which would then produce a possible set of tuple lists given the constraints in the data. I looked online for a little bit but I was interested if you know of any such sets of data gathered from the real world and if you know of any map data freely available. Also, what would your thoughts on that a project be?

The second topic I wanted to discuss relates to the assignment and is more of an observation/question about heuristics. First, while implementation of the assignment took a bit of time the hardest part was definitely the creation of heuristics. The early on in the assignment I was thinking how we wanted first wanted to direct the search towards close dirt patches, and second we wanted to direct towards dirt patches that were grouped. To do this I came up with the inadmissible heuristic which I dubbed the “Dirt Rainbow Spread” Heuristic. What this algorithm does is rates a state’s H (N) as the sum of Manhattan distances to all dirts and the summation of summations of those dirt’s Manhattan distances to the other dirts on the board; more about this in a second.

The second heuristic I came up with calculates the Manhattan distance to the nearest dirt, and then repeats this until it creates a tour of all dirts. This is an admissible heuristic since the route given here solves the Euclidian space traveling sales man problem with the shortest possible tour. It relaxes the problem by saying there are not blocked squares. The real path will only be longer then this cost since we are in Euclidian space. Because of the inadmissibility of my Dirt Rainbow Spread it is a bit greedy of an algorithm. This makes it do certain domains, with a-star, quite well, such as when all the dirt is grouped on one side of the map. It is fact does better then my second heuristic and your heuristics on certain problems. When I switch to a greedy search, however, with the same heuristic, the admissible heuristic wins handsomely. This raises the question in my mind. Does an admissible heuristic in weighted a-star always beat an inadmissible heuristic at some given weight W and above in a given domain? If this is the case it would really point out how inadmissible heuristics aren’t too useful.

Week 1

1/27/091 AM

The first topic I would like to discuss in this week’s journal entry relates to the current assignment. At the beginning of this assignment I had a general idea of what I wanted for a design but as I flushed out the implementation a number of interesting issues have arisen. First, of course, in terms of efficiency the actual algorithm used and its ability to prune the size of the tree searched is the most important since it can be exponential in size. Still, one’s state representation can be quite large as well. By keeping track of the dirt added and removed in an integer from a given state an O(N) operation can be converted into a simple O(1) operation for many equality tests, except in the case where dirt totals match. Over the course of the entire exponential search this could be quiet substantial. It would be interesting to see some raw data around this and how much such a simple change would in fact make a difference. How much time should be spent optimizing the inner workings of the computations over the high level algorithm itself?

A second topic to consider has to do with algorithms. Iterative deepening search, as mentioned in class, can be a good algorithm for many applications. It was also mentioned during class that people intuitively think this method would be inefficient due to the redoing of so much work inherent to the algorithm. The solution to this would be to save all the previously completed searches, yet this won’t really work since it would take exponential memory to save these searches. One wonders if there is a happy medium between these two methods. By this I mean can certain information be saved to help prune later iterations over the tree? Consider a maze that has many spider-like paths. Perhaps in this maze there are paths which either when taken will always be dead ends or perhaps by using a heuristic estimate they seem like they will not be a good path to take. Is there a good method to, after each iterative of IDS, categorize sub graphs and memorize these sub graphs for pruning on future trees.

Identification of dead ends that do not result from the depth bound seems to be a do-able procedure, but except in a few domains this would seem not be that effective since how large of dead ends really occur often? It would be interesting to see a procedure to identify a sub tree that can be detached from the rest of the tree and then in some way given a cost to prioritize its pruning in later iterations of IDS even when this sub tree is not a dead end. This method would also have to reconsider these pruned sub trees in yet later iterations of IDS just in case the pruning heuristic was incorrect. These are of course just rough ideas and the ideas would have to be flushed out, implemented and tests would have to be done to see if there is any merit. One final note to make is IDS search almost can be thought of as a variant of breath first search which uses linear memory, since it only goes one layer deeper each iteration.