Hi-C data preprocessing
The section is designed for data preprocessing of raw omics data.
1. estimate_resolution
Before preprocessing, we can first explore the parameter super resolution in HOMER.
Note
This step is not required.
dna.estimate_resolution(chromosome, cal_type, cohort1, cohort2 = "", fig_dpi = 300)
Arguments
Required arguments:
chromosome
: The chromosome you want to investigate, i.e. [‘chr1’, ‘chr2’, …]. If you want to check all the chromosome, please use ‘whole_genome’. Whole genome estimation is recommended but time consuming.cal_type
: the type of calculation, where 0: comparison between different resolution, 1: comparison between different super resolution with resolution equal to 50kb, 2: comparison between different super resolution with resolution equal to 100kb, 3: comparison between different super resolution with resolution equal to 500kb.cohort1
: The experimental condition for input data.
Optional arguments:
cohort2
: The additional experimental condition for input data if you want to compare with cohort1. The default is “”.fig_dpi
: Figure resolution in dots per inch. The default is 300.
2. preprocess_hic_tag2homer
Next, applying HOMER to raw Hi-C data (HICUP) to filter and reorganize significant intra-chromosomal interactions.
dna.preprocess_hic_tag2homer(super_resolution, p_value = 0.1, zscore = 1.0)
Arguments
Required arguments:
super_resolution
: The range of signal averaging. In the manual of HOMER, it is recommended that super resolution be the same as resolution.
Optional arguments:
p_value
: Cutoff for intra-chromosomal interactions. The default is0.1
.zscore
: Cutoff for intra-chromosomal interactions. The default is1.0
.
Data path
input data path: /in_data_folder/hic_data/hicup_processed/cohort
output data path: /out_data_folder/hic_data/hic_interaction_homer
3. preprocess_hic_homer2bed
Then, BEDTools is used to convert HOMER exported significant interactions to bed format files and Hi-C adjacency matrices are build with predefined resolution.
dna.preprocess_hic_homer2bed(genome_version = 'hg19', fig_dpi = 300)
Arguments
Required arguments:
genome_version
: Chromosome regions such as hg19, hg38. The default is ‘hg19’.
Optional arguments:
fig_dpi
: Figure resolution in dots per inch. The default is 300.
Data path
output data path: The output data including heatmap of interactions, are stored in /out_data_folder/hic_data/hic_interaction_bed.