Feature
Quick Start
Pattern Syntax
Rule Essentials
Rule Object Reference
Atomic Rule
- pattern
- Pattern Object
- strictness
- kind
- regex
- nthChild
- range
- Tips for Writing Rules
Relational Rules
Composite Rule
Examples
Rule Cheat Sheet
Config Cheat Sheet
Frequently Asked Questions
Tree-sitter
Issue
- [feature] Parse multiple pieces of code (same language) one time #448
Refer

ast-grep(sg) is a CLI tool for code structural search, lint, and rewriting.

ast-grep is an abstract syntax tree based tool to search code by pattern code. Think of it as your old-friend grep, but matching AST nodes instead of text. You can write patterns as if you are writing ordinary code. It will match all code that has the same syntactical structure. You can use $ sign + upper case letters as a wildcard, e.g. $MATCH, to match any single AST node. Think of it as regular expression dot ., except it is not textual.

Try the online playground for a taste!

Feature

Search and Rewrite

ast-grep is a code tool for structural search and replace. It is like syntax-aware grep/sed! You can write code patterns to locate and modify code, based on AST, in thousands of files, interactively.

ast-grep -p '$A && $A()' -r '$A?.()'

astgrep1

Scan as Linter

ast-grep is a versatile and flexible tool for linting code with AST patterns. You can easily add new customized rules with intuitive syntax and enjoy pretty error reporting out of box.

ast-grep scan

astgrep2

Programmatic Usage

ast-grep also provides node-js binding to access syntax trees programmatically. You can use jQuery like utility methods to traverse syntax tree nodes. Node API also has opt-in type safety.

npm install @ast-grep/napi

astgrep3

Quick Start

You can unleash ast-grep’s power at your finger tips within few keystrokes in command line!

Let’s try its power of by rewriting some code in a moderately large codebase: TypeScript.

Our task is to rewrite old defensive code that checks nullable nested method calls to the new shiny optional chaining operator ?..

Installation

First, install ast-grep. It is distributed by npm, cargo, homebrew and macports. You can also build it from source.

# install via pip
$ pip install ast-grep-cli

$ which ast-grep
~/.local/bin/ast-grep

$ ast-grep --version
ast-grep 0.39.1

$ ast-grep --help
Search and Rewrite code at large scale using AST pattern.
                    __
        ____ ______/ /_      ____ _________  ____
       / __ `/ ___/ __/_____/ __ `/ ___/ _ \/ __ \
      / /_/ (__  ) /_/_____/ /_/ / /  /  __/ /_/ /
      \__,_/____/\__/      \__, /_/   \___/ .___/
                          /____/         /_/


Usage: ast-grep [OPTIONS] <COMMAND>

Commands:
  run          Run one time search or rewrite in command line. (default command)
  scan         Scan and rewrite code by configuration
  test         Test ast-grep rules
  new          Create new ast-grep project or items like rules/tests
  lsp          Start language server
  completions  Generate shell completion script
  docs         Generate rule docs for current configuration. (Not Implemented Yet)
  help         Print this message or the help of the given subcommand(s)

Options:
  -c, --config <CONFIG_FILE>
          Path to ast-grep root config, default is sgconfig.yml

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Note: Use sg on Linux

Linux has a default command sg for setgroups. You can use the full command name ast-grep instead of sg. You can also use shorter alias if you want by alias sg=ast-grep.

Optionally, you can grab TypeScript source code if you want to follow the tutorial. Or you can apply the magic to your own code.

git clone https://github.com/microsoft/TypeScript --depth 1

Pattern

Then search the occurrence of looking up a method from a nested structure. ast-grep uses pattern to find similar code. Think it as the pattern in our old-friend grep but it matches AST node instead of text. We can write pattern as if write ordinary code. It will match all code that has the same syntactical structure.

For example, the following pattern code

obj.val && obj.val()

will match all the following code, regardless of white spaces or new lines.

obj.val && obj.val() // verbatim match, of course
obj.val    &&     obj.val() // this matches, too

// this matches as well!
const result = obj.val &&
   obj.val()

Matching based exactly on AST is cool, but we certainly want to use flexible pattern to match code with infinite possibility. We can use meta variable to match any single AST node. Meta variable begins with $ sign with upper case letters following, e.g. $METAVAR. Think it as REGEX dot ., except it is not textual.

We can write this pattern to find all property checking code.

$PROP && $PROP()

It is a valid ast-grep pattern! We can use it in command line! Use pattern argument to specify our target. Optionally, we can use lang to tell ast-grep our target code language.

# Full Command
ast-grep --pattern '$PROP && $PROP()' --lang ts TypeScript/src

# Short Form
ast-grep -p '$PROP && $PROP()' -l ts TypeScript/src

# Without Lang
# ast-grep will infer languages based on file extensions
ast-grep -p '$PROP && $PROP()' TypeScript/src

astgrep4

astgrep5

Rewrite

Cool? Now we can use this pattern to refactor TypeScript source!

# pattern and language argument support short form
ast-grep -p '$PROP && $PROP()' \
   --rewrite '$PROP?.()' \
   --interactive \
   -l ts \
   TypeScript/src

ast-grep will start an interactive session to let you choose if you want to apply the patch. Press y to accept the change!

astgrep6

That’s it! You have refactored TypeScript’s repository in minutes. Congratulation!

Hope you enjoy the power of AST editing in plain programming language pattern. Our next step is to know more about the pattern code.

Pattern Syntax

In this guide we will walk through ast-grep’s pattern syntax. The example will be written in JavaScript, but the basic principle will apply to other languages as well.

Pattern Matching

ast-grep uses pattern code to construct AST tree and match that against target code. The pattern code can search through the full syntax tree, so pattern can also match nested expression. For example, the pattern a + 1 can match all the following code.

const b = a + 1

funcCall(a + 1)

deeplyNested({
  target: a + 1
})

WARNING

Pattern code must be valid code that tree-sitter can parse.

ast-grep playground is a useful tool to confirm pattern is parsed correctly.

If ast-grep fails to parse code as expected, you can try give it more context by using object-style pattern.

Meta Variable

It is usually desirable to write a pattern to match dynamic content.

We can use meta variables to match sub expression in pattern.

Meta variables start with the $ sign, followed by a name composed of upper case letters A-Z, underscore _ or digits 1-9. $META_VARIABLE is a wildcard expression that can match any single AST node.

Think it as REGEX dot ., except it is not textual.

Valid meta variables

$META, $META_VAR, $META_VAR1, $_, $_123

Invalid meta variables

$invalid, $Svalue, $123, $KEBAB-CASE，$

The pattern console.log($GREETING) will match all the following.

function tryAstGrep() {
  console.log('Hello World')
}

const multiLineExpression =
  console
   .log('Also matched!')

But it will not match these.

// console.log(123) in comment is not matched
'console.log(123) in string' // is not matched as well
console.log() // mismatch argument
console.log(a, b) // too many arguments

Note, one meta variable $MATCH will match one single AST node, so the last two console.log calls do not match the pattern. Let’s see how we can match multiple AST nodes.

Multi Meta Variable

We can use $$$ to match zero or more AST nodes, including function arguments, parameters or statements. These variables can also be named, for example: console.log($$$ARGS).

Function Arguments

For example, console.log($$$) can match

console.log()                       // matches zero AST node
console.log('hello world')          // matches one node
console.log('debug: ', key, value)  // matches multiple nodes
console.log(...args)                // it also matches spread

Function Parameters

function $FUNC($$$ARGS) { $$$ } will match

function foo(bar) {
  return bar
}

function noop() {}

function add(a, b, c) {
  return a + b + c
}

astgrep7

Meta Variable Capturing

Meta variable is also similar to capture group in regular expression. You can reuse same name meta variables to find previously occurred AST nodes.

For example, the pattern $A == $A will have the following result.

// will match these patterns
a == a
1 + 1 == 1 + 1
// but will not match these
a == b
1 + 1 == 2

Non Capturing Match

You can also suppress meta variable capturing. All meta variables with name starting with underscore _ will not be captured.

// Given this pattern

$_FUNC($_FUNC)

// it will match all function call with one argument or spread call
test(a)
testFunc(1 + 1)
testFunc(...args)

Note in the example above, even if two meta variables have the same name $_FUNC, each occurrence of $_FUNC can match different content because they are not captured.

Why use non-capturing match?

This is a useful trick to micro-optimize pattern matching speed, since we don’t need to create a HashMap for bookkeeping.

Capture Unnamed Nodes

A meta variable pattern $META will capture named nodes by default. To capture unnamed nodes, you can use double dollar sign $$VAR.

Namedness is an advanced topic in Tree-sitter. You can read this in-depth guide for more background.

More Powerful Rule

Pattern is a fast and easy way to match code. But it is not as powerful as rule which can match code with more precise selector or more context.

Pro Tip

Pattern can also be an object instead of string in YAML rule.

It is very useful to avoid ambiguity in code snippet. See here for more details.

Also see our FAQ for more guidance on writing patterns.

Rule Essentials

Now you have learnt the basic of ast-grep’s pattern syntax and searching. Pattern is a handy feature for simple search. But it is not expressive enough for more complicated cases.

ast-grep provides a more sophisticated way to find your code: Rule.

Rules are like CSS selectors that can compose together to filter AST nodes based on certain criteria.

A Minimal Example

A minimal ast-grep rule looks like this.

id: no-await-in-promise-all
language: TypeScript
rule:
  pattern: Promise.all($A)
  has:
    pattern: await $_
    stopBy: end

The TypeScript rule, no-await-in-promise-all, will find Promise.all that has await expression in it.

It is suboptimal because Promise.all will be called only after the awaited Promise resolves first.

Let’s walk through the main fields in this configuration.

id is a unique short string for the rule.
language is the programming language that the rule is intended to check. It specifies what files will be checked against this rule, based on the file extensions. See the list of supported languages.
rule is the most interesting part of ast-grep’s configuration. It accepts a rule object and defines how the rule behaves and what code will be matched. You can learn how to write rule in the detailed guide.

Run the Rule

There are several ways to run the rule. We will illustrate several ast-grep features here.

ast-grep scan –rule

The scan subcommand of ast-grep CLI can run one rule at a time.

To do so, you need to save the rule above in a file on the disk, say no-await-in-promise-all.yml. Then you can run the following command to scan your codebase. In the example below, we are scanning a test.ts file.

ast-grep scan --rule no-await-in-promise-all.yml test.ts

await Promise.all([
  await foo(),
])

ast-grep scan –inline-rules

You can also run the rule directly from the command line without saving the rule to a file. The --inline-rules option is useful for ad-hoc search or calling ast-grep from another program.

ast-grep scan --inline-rules '
id: no-await-in-promise-all
language: TypeScript
rule:
  pattern: Promise.all($A)
  has:
    pattern: await $_
    stopBy: end
' test.ts

Online Playground

ast-grep provides an online playground to test your rule.

You can paste the rule configuration into the playground and see the matched code. The playground also has a share button that generates a link to share the rule with others.

Rule Object

Rule object is the core concept of ast-grep’s rule system and every other features are built on top of it.

Below is the full list of fields in a rule object. Every rule field is optional and can be omitted but at least one field should be present in a rule. A node will match a rule if and only if it satisfies all fields in the rule object.

The equivalent rule object interface in TypeScript is also provided for reference.

rule:
  # atomic rule
  pattern: 'search.pattern'
  kind: 'tree_sitter_node_kind'
  regex: 'rust|regex'
  # relational rule
  inside: { pattern: 'sub.rule' }
  has: { kind: 'sub_rule' }
  follows: { regex: 'can|use|any' }
  precedes: { kind: 'multi_keys', pattern: 'in.sub' }
  # composite rule
  all: [ {pattern: 'match.all'}, {kind: 'match_all'} ]
  any: [ {pattern: 'match.any'}, {kind: 'match_any'} ]
  not: { pattern: 'not.this' }
  matches: 'utility-rule'

interface RuleObject {
  // atomic rule
  pattern?: string | Pattern
  kind?: string
  regex?: string
  // relational rule
  inside?: RuleObject & Relation
  has?: RuleObject & Relation
  follows?: RuleObject & Relation
  precedes?: RuleObject & Relation
  // composite rule
  all?: RuleObject[]
  any?: RuleObject[]
  not?: RuleObject
  matches?: string
}

// See Atomic rule for explanation
interface Pattern {
  context: string
  selector: string
  strictness?: Strictness
}

// See https://ast-grep.github.io/advanced/match-algorithm.html
type Strictness =
  | 'cst'
  | 'smart'
  | 'ast'
  | 'relaxed'
  | 'signature'

// See Relation rule for explanation
interface Relation {
  stopBy?: 'neighbor' | 'end' | RuleObject
  field?: string
}

A node must satisfies all fields in the rule object to be considered as a match. So the rule object can be seen as an abbreviated and unordered all rule.

Rule object is unordered!!

Unordered rule object means that certain rules may be applied before others, even if they appear later in the YAML. Whether a node matches or not may depend on the order of rule being applied, especially when using has/inside rules.

If a rule object does not work, you can try using all rule to specify the order of rules. See FAQ for more details.

Three Rule Categories

To summarize the rule object fields above, we have three categories of rules:

Atomic Rule: the most basic rule that checks if AST nodes matches.
Relational Rule: rules that check if a node is surrounded by another node.
Composite Rule: rules that combine sub-rules together using logical operators.

These three categories of rules can be composed together to create more complex rules.

The rule object is inspired by the CSS selectors but with more composability and expressiveness. Think about how selectors in CSS works can help you understand the rule object!

TIP

Don’t be daunted! Learn more about how to write a rule in our detailed guide.

Target Node

Every rule configuration will have one single root rule. The root rule will have only one AST node in one match. The matched node is called target node. During scanning and rewriting, ast-grep will produce multiple matches to report all AST nodes that satisfies the rule condition as matched instances.

Though one rule match only have one AST node as matched, we can have more auxiliary nodes to display context or to perform rewrite. We will cover how rules work in details in the next page.

But for a quick primer, a rule can have a pattern and we can extract meta variables from the matched node.

For example, the rule below will match the console.log('Hello World').

rule:
  pattern: console.log($GREET)

And we can get $GREET set to 'Hello World'.

language specifies rule interpretation

The language field in the rule configuration will specify how the rule is interpreted. For example, with language: TypeScript, the rule pattern 'hello world' is parsed as TypeScript string literal. However, the rule will have a parsing error in languages like C/Java/Rust because single quote is used for character literal and double quote should be used for string.

Rule Object Reference

Atomic rules are the most basic rules to match AST nodes.
Relational rules filter matched target according to their position relative to other nodes.
Composite rules use logic operation all/any/not to compose the above rules to larger rules.

All of these keys are optional. However, at least one of them must be present and positive.

A rule is called positive if it only matches nodes with specific kinds. For example, a kind rule is positive because it only matches nodes with the kind specified by itself. A pattern rule is positive because the pattern itself has a kind and the matching node must have the same kind. A regex rule is not positive though because it matches any node as long as its text satisfies the regex.

Atomic Rule

ast-grep has three categories of rules. Let’s start with the most basic one: atomic rule.

Atomic rule defines the most basic matching rule that determines whether one syntax node matches the rule or not. There are five kinds of atomic rule: pattern, kind, regex, nthChild and range.

pattern

Pattern will match one single syntax node according to the pattern syntax.

rule:
  pattern: console.log($GREETING)

The above rule will match code like console.log('Hello World').

By default, a string pattern is parsed and matched as a whole.

Pattern Object

It is not always possible to select certain code with a simple string pattern. A pattern code can be invalid, incomplete or ambiguous for the parser since it lacks context.

For example, to select class field in JavaScript, writing $FIELD = $INIT will not work because it will be parsed as assignment_expression. See playground.

astgrep8

We can also use an object to specify a sub-syntax node to match within a larger context. It consists of an object with three properties: context, selector and strictness.

context (required): defines the surrounding code that helps to resolve any ambiguity in the syntax.
selector (optional): defines the sub-syntax node kind that is the actual matcher of the pattern.
strictness (optional): defines how strictly pattern will match against nodes.

Let’s see how pattern object can solve the ambiguity(模棱两可) in the class field example above.

The pattern object below instructs ast-grep to select the field_definition node as the pattern target.

pattern:
  selector: field_definition
  context: class A { $FIELD = $INIT }

ast-grep works like this:

First, the code in context, class A { $FIELD = $INIT }, is parsed as a class declaration.
Then, it looks for the field_definition node, specified by selector, in the parsed tree.
The selected $FIELD = $INIT is matched against code as the pattern.

In this way, the pattern is parsed as field_definition instead of assignment_expression. See playground in action.

rule:
  pattern:
    selector: field_definition
    context: class A { $FIELD = $INIT }

astgrep9

Other examples are function call in Go and function parameter in Rust.

strictness

You can also use pattern object to control the matching strategy with strictness field.

By default, ast-grep uses a smart strategy to match pattern against the AST node. All nodes in the pattern must be matched, but it will skip unnamed nodes in target code.

For the definition of named and unnamed nodes, please refer to the core concepts doc.

For example, the following pattern function $A() {} will match both plain function and async function in JavaScript. See playground

// function $A() {}
function foo() {}    // matched
async function bar() {} // matched

astgrep10

This is because the keyword async is an unnamed node in the AST, so the async in the code to search is skipped. As long as function, $A and {} are matched, the pattern is considered matched.

However, this is not always the desired behavior. ast-grep provides strictness to control the matching strategy. At the moment, it provides these options, ordered from the most strict to the least strict:

cst: All nodes in the pattern and target code must be matched. No node is skipped.
smart: All nodes in the pattern must be matched, but it will skip unnamed nodes in target code. This is the default behavior.
ast: Only named AST nodes in both pattern and target code are matched. All unnamed nodes are skipped.
relaxed: Named AST nodes in both pattern and target code are matched. Comments and unnamed nodes are ignored.
signature: Only named AST nodes’ kinds are matched. Comments, unnamed nodes and text are ignored.

Deep Dive and More Examples

strictness is an advanced feature that you may not need in most cases.

If you are interested in more examples and details, please refer to the deep dive doc on ast-grep’s match algorithm.

kind

Sometimes it is not easy to write a pattern because it is hard to construct the valid syntax.

For example, if we want to match class property declaration in JavaScript like class A { a = 1 }, writing a = 1 will not match the property because it is parsed as assigning to a variable.

Instead, we can use kind to specify the AST node type defined in tree-sitter parser.

kind rule accepts the tree-sitter node’s name, like if_statement and expression. You can refer to ast-grep playground for relevant kind names.

Back to our example, we can look up class property’s kind from the playground.

rule:
  kind: field_definition

It will match the following code successfully (playground link).

class Test {
  a = 123 // match this line
}

astgrep11

Here are some situations that you can effectively use kind:

Pattern code is ambiguous to parse, e.g. {} in JavaScript can be either object or code block.
It is too hard to enumerate all patterns of an AST kind node, e.g. matching all Java/TypeScript class declaration will need including all modifiers, generics, extends and implements.
Patterns only appear within specific context, e.g. the class property definition.

kind + pattern is different from pattern object

You may want to use kind to change how pattern is parsed. However, ast-grep rules are independent of each other.

To change the parsing behavior of pattern, you should use pattern object with context and selector field. See this FAQ.

regex

The regex atomic rule will match the AST node by its text against a Rust regular expression.

rule:
  regex: "\w+"

TIP

The regular expression is written in Rust syntax, not the popular PCRE like syntax. So some features are not available like arbitrary look-ahead and back references.

You should almost always combine regex with other atomic rules to make sure the regular expression is applied to the correct AST node. Regex matching is quite expensive and cannot be optimized based on AST node kinds. While kind and pattern rules can be only applied to nodes with specific kind_id for optimized performance.

nthChild

nthChild is a rule to find nodes based on their indexes in the parent node’s children list. In other words, it selects nodes based on their position among all sibling nodes within a parent node. It is very helpful in finding nodes without children or nodes appearing in specific positions.

nthChild is heavily inspired by CSS’s nth-child pseudo-class, and it accepts similar forms of arguments.

# a number to match the exact nth child
nthChild: 3

# An+B style string to match position based on formula
nthChild: 2n+1

# object style nthChild rule
nthChild:
  # accepts number or An+B style string
  position: 2n+1
  # optional, count index from the end of sibling list
  reverse: true # default is false
  # optional, filter the sibling node list based on rule
  ofRule:
    kind: function_declaration # accepts ast-grep rule

TIP

nthChild’s index is 1-based, not 0-based, as in the CSS selector. nthChild’s node list only includes named nodes, not unnamed nodes.

The following rule will match the second number in the JavaScript array.

rule:
  kind: number
  nthChild: 2

It will match the following code:

const arr = [ 1, 2, 3, ]
            //   |- match this number

astgrep12

range

range is a rule to match nodes based on their position in the source code. It is useful when you want to integrate external tools like compilers or type checkers with ast-grep. External tools can provide the range information of the interested node, and ast-grep can use it to rewrite the code.

range rule accepts a range object with start and end fields. Each field is an object with line and column fields.

rule:
  range:
    start:
      line: 0
      column: 0
    end:
      line: 1
      column: 5

The above example will match an AST node having the first three characters of the first line like foo in foo.bar().

line and column are 0-based and character-wise, and the start is inclusive while the end is exclusive.

Tips for Writing Rules

Since one rule will have only one AST node in one match, it is recommended to first write the atomic rule that matches the desired node.

Suppose we want to write a rule which finds functions without a return type. For example, this code would trigger an error:

const foo = () => {
	return 1;
}

The first step to compose a rule is to find the target. In this case, we can first use kind: arrow_function to find function node. Then we can use other rules to filter candidate nodes that does have return type.

Another trick to write cleaner rule is to use sub-rules as fields. Please refer to composite rule for more details.

Relational Rules

Atomic rule can only match the target node directly. But sometimes we want to match a node based on its surrounding nodes. For example, we want to find await expression inside a for loop.

Relational rules are powerful operators that can filter the target nodes based on their surrounding nodes.

ast-grep now supports four kinds of relational rules: inside, has, follows, and precedes.

All four relational rules accept a sub rule object as their value. The sub rule will match the surrounding node while the relational rule itself will match the target node.

Composite Rule

Composite rule can accept another rule or a list of rules recursively. It provides a way to compose atomic rules into a bigger rule for more complex matching.

Below are the four composite rule operators available in ast-grep: all, any, not, and matches.

Examples

Rule Cheat Sheet

This cheat sheet provides a concise overview of ast-grep’s rule object configuration, covering Atomic, Relational, and Composite rules, along with notes on Utility rules. It’s designed as a handy reference for common usage.

Config Cheat Sheet

This cheat sheet provides a concise overview of ast-grep’s linter rule YAML configuration. It’s designed as a handy reference for common usage.

Frequently Asked Questions

https://ast-grep.github.io/advanced/faq.html

Tree-sitter

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:

General enough to parse any programming language
Fast enough to parse on every keystroke in a text editor
Robust enough to provide useful results even in the presence of syntax errors
Dependency-free so that the runtime library (which is written in pure C11) can be embedded in any application

Issue

[feature] Parse multiple pieces of code (same language) one time #448

Q: But I don’t know how I can do it and what’s the best practice to leverage ast-grep(or Rust)’s powerful parallel mechanism.

A: this feature has been supported by parseAsync! ac78bc0

Here is the blog benchmarking parallel parsing of different parsers! Benchmark TypeScript Parsers: Demystify Rust Tooling Performance

Refer

https://ast-grep.github.io/
https://tree-sitter.github.io/tree-sitter/index.html
Rule Object Reference