Skip to contents

This function reads in a GTF file, extracts gene annotations, and merges them with event-level genomic intervals provided by the user. The final data table contains the original event intervals and the corresponding gene information (for example, gene_id and gene_name).

Usage

make_eventdata_plus(eventdata, GTF_file_direction)

Arguments

eventdata

A data table of genomic intervals for events. Must contain columns:

  • chr: Chromosome name (e.g., "chr1").

  • start: Start coordinate of the event.

  • end: End coordinate of the event.

  • strand: Numeric strand indicator (1 or 2).

GTF_file_direction

A character string specifying the path to a GTF file. The file must contain at least these columns: seqid, start, end, strand, gene_id, and gene_name.

Value

A data table containing overlapping event intervals with added gene metadata (such as gene_id and gene_name). The columns returned will include both event-level and gene-level information.

Details

  1. Read the GTF: Uses data.table::fread to load GTF data and convert it to a data table.

  2. Subset for Genes: Keeps only rows where type == "gene", retaining columns for chromosome, start, end, strand, gene_id, and gene_name.

  3. Strand Conversion: Merges the GTF data with a small lookup table to replace + and - with numeric strand indicators 1 and 2 (matching STAR).

  4. Overlaps: With both data sets keyed, uses foverlaps() from data.table to find intervals in eventdata that fall fully within gene boundaries.