This function reads in a GTF file, extracts gene annotations, and merges them with
event-level genomic intervals provided by the user. The final data table contains the
original event intervals and the corresponding gene information (for example, gene_id
and gene_name).
Arguments
- eventdata
A data table of genomic intervals for events. Must contain columns:
chr: Chromosome name (e.g., "chr1").start: Start coordinate of the event.end: End coordinate of the event.strand: Numeric strand indicator (1or2).
- GTF_file_direction
A character string specifying the path to a GTF file. The file must contain at least these columns:
seqid,start,end,strand,gene_id, andgene_name.
Value
A data table containing overlapping event intervals with added gene metadata
(such as gene_id and gene_name). The columns returned will include both event-level
and gene-level information.
Details
Read the GTF: Uses
data.table::freadto load GTF data and convert it to a data table.Subset for Genes: Keeps only rows where
type == "gene", retaining columns for chromosome, start, end, strand, gene_id, and gene_name.Strand Conversion: Merges the GTF data with a small lookup table to replace
+and-with numeric strand indicators1and2(matching STAR).Overlaps: With both data sets keyed, uses
foverlaps()fromdata.tableto find intervals ineventdatathat fall fully within gene boundaries.
